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ITEM SELECTION BY MEANS OF A MAXIMIZING FUNCTION 


PAUL HORST 


The Procter and Gamble Company 
Cincinnati, Ohio 


A new item selection technique is presented which takes into 
account the intercorrelations of the items as well as their corre- 
lations with the criterion. The technique is regarded as superior to 
comparable techniques in that it is considered to achieve greater 
economy of time, greater objectivity of procedure, higher validity, 
and higher reliability. The mathematical theory underlying the 
method is developed. An approximate solution of the mathematical 
equations is suggested. An approximation procedure for the com- 
plete item selection technique is presented, based on the mathe- 
matical solution, but much simpler in procedure. The clerical opera- 
tions involved in the approximation procedure are outlined and il- 
lustrated on a sample worksheet. 

Numerous methods purporting to increase the validity of objec- 
tive tests have been published. Many of these are based on techniques 
of individual item analysis. Perhaps the most comprehensive report 
of existing item analysis techniques is that of Long and Sandiford.* 

It has long been known that the validity of a test is a function, 
not only of the correlation of each item with the criterion, but also 
of all the possible intercorrelations of the test items. Notwithstand- 
ing this knowledge, surprisingly little has been done in the way of 
developing item selection techniques which take into account the in- 
tercorrelations of the individual items. Professor H. A. Toops of 
Ohio State University has developed an item selection method, known 
as the “L-method”, which takes into account item intercorrelations, 
but so far as I know, this method is not readily available in published 
form. Furthermore, I understand that the method involves a great 
deal of labor. 

Another technique,j known as “The Method of Successive Resi- 
duals”, is also based on a consideration of item intercorrelations, as 
well as item-criterion correlations. While this method has yielded re- 
sults definitely superior to those ignoring item intercorrelations, it 
also is time consuming. 


*Long, John A. and Sandiford, Peter. “The Validation of Test Items”, Bul- 
letin No. 3, Department of Educational Research, University of Toronto. 

+Horst, Paul, “Item Analysis by the Method of Successive Residuals”, Jour- 
nal of Experimental Education II (March 1934) pp. 254-263. 
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Recently, however, I have developed a method which is based on 
both inter-item and criterion correlations, which I regard as much 
superior to the “Method of Successive Residuals” for the following 


reasons: 

1. Less Time Consuming. It requires only from one-third to 
one-half the time required by the other method. 

2. Greater Objectivity. For the “successive residuals” tech- 
nique to be practicable it is necessary to run the analysis on separate 
groups of from 40 to 50 items at one time. The grouping of these 
items is largely arbitrary. 

3. Higher Validity. In yeneral the new method yields slightly 
higher validity coefficients. 

4. Higher Reliability. This point I have not verified experi- 
mentally, but the mathematical rationale underlying the latter meth- 
od suggests definitely that higher reliability may be expected than by 
the “Method of Successive Residuals”. 


I. Theoretical Solution 


Suppose that we have a test battery of n items, and that we have 
responses on all these items from a population of N cases. We also 
have an external criterion measure Y. Let us consider the problem of 
selecting m items from the total group of » items so that when the 
cases are scored on only these m items the scores will give the maxi- 
mum correlation with the criterion measures Y. We let X,, Xo, --:, Xn 
be the scores on the individual items. If we adopt the unit weight 
procedure for scoring the items, the X values will be either zero or 
unity. 

The score of individual e on the total battery of items is then 


Se= Xe + Xeo t+ + Xen (1) 


or simply the total number of items answered correctly. 
The correlation between the total test scores and the criterion is 
given by: 
NZSYS—SYIS 


Tyg 2 
” NoyVNSS?— (SS)? (2) 








If we let 
M, = mean of criterion scores 


M,= mean of total test scores 
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equation (2) may be written: 
‘ >YS—M,zS8S 
" VNoV2s7—M,28S 


By means of equation (1), equation (3) may be further modi- 
fied to: 


(3) 





EV (KX, + Xa ++ 4- Xe) — My Z(Ki + MoH -X 
Vide, 42, a) — eo 


(4) 





VN Cy Tyg = 





If we define 


> Y;, =the sum of the criterion scores for all who answered item 
k correctly, 


> S,== the sum of the total test scores for all who answered item 
k correctly, 


N;, =the number who answered item / corretly, 


7, == the correlation between the criterion and totai test scores 
based on all n items, 


equation (4) may be written: 
VN oy Tr, = 


(= Y,—M,N,)+(2 Y,— M,N.)+---+(2 Y, — M,N,) 
V (XS,—M, N,)+ (2 8, — M; N,)+ ++» + (2 Sn — M; Nn) 








(5) 
If we let 
t= = Y,—M, Ny (6) ; 
Vy%=2>S,—M, Nx (7) ! 


equation (5) becomes 





or simply 
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n 
2 Ux 


” 


(8) 





oe ln = 





Uk 


- Ms 


since \/N o, is a constant independent of the test items. Equation 
(8) is the basic formula from which our item selection technique will 
be developed. 

For each item we have a uw and a v value. Let us consider the 
scatter diagram of the w values plotted against the v values. Suppose 
the bounded area in figure (1) represents the area within which the 
u and v values of the items are plotted. Let uw and wm, represent the 
lower and upper limits respectively of the u values, and v and 1 
the lower and upper limits of the v values. Next we divide the area 
vertically and horizontally into equal intervals. Let u; be the mid- 
point for the 7’th interval of u, and v; the midpoint for the 7’th inter- 
val of v. Let fi; be the number of items whose w values fall in the 
’’th interval of u and whose v values fall in the 7’th interval of v. The 
uw values of all items whose v values lie in the 7’th interval of v will 
lie in the cross-hatched column of figure (1). The sum of the w values 
of these items is given by x u; fi;, where the summation is from the 


lowest to the highest class interval in the column. The sum of all the 
u values will be the sum of all such column summations or 


Tuw=—-ZSF uj fi; (9) 


j #4 
By the same reasoning we may express the sum of all the v values by 


a Te = 20; fi; (10) 
j i 
Substituting (9) and (10) in (8), 


CT, = es (11) 





The total number of items represented within the bounded region in 
figure (1) is simply the sum of the frequencies in the individual cells, 
or 

n= 


a | 
2. M 


E fis (12) 
Assume now that we can fit a bi-variate frequency surface 


f=f(u,v) (13) 
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to the scatter diagram represented by figure (1). Let f be asympto- 
tic to the (u,v) plane, and continuous with continuous first deriva- 
tives. Substituting double integration for the double summation in 
(12) we get 


n= [” ec vy») dudv (14) 


Logically the limits would be taken from —o to +o since the 
fitted function has been chosen asymptotic to the (u,v) plane. How- 
ever, if the function gives a good fit to the experimental data, (14) 
will approximate closely the integrations from —« to +o. 

We may also substitute for equation (11) 

ieee Z : 
bem igi _ vy dudv (15) 
VSO Sh vt (u,v) dudv 


Ht 








Equation (15) expresses the correlation between the criterion and 
scores based on the total number of items. 

Next suppose we select only a specified number of items, m, let 
us say. We want these m items to be those which give the maximum 
correlation with Y. We may assume without loss of generality that 
these are the first m items in the series. The correlation, then, may 
be indicated by 


m 


ps Un, 
CP? yn = - (16) 
m 
By aim) 
1 
where 
g(m) .. “S* Gm) (m) AT 
vom = SS _ Mm N, (17) 


and the superscript (m) means that only the m items are involved in 
the values S(” and M™. 

We assume now that the items which will make (16) a maximum 
are the same as will make 


m 
2 Ux 


an: (18) 


y Up: 
a maximum. It will be noted that the ratio in (18) differs from (16) 
only in that the superscript (m) has been dropped from 2, in the 
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former. This means that in (18) the v; values are calculated on the 
basis of all n items rather than only the m best items. 

The assumption that those items which make (16) a maximum 
will also make (18) a maximum is not necessarily true. However, it 
is not possible to determine the values v‘” before we know which are 
the best items. We therefore consider the problem of selecting those 
m items whose wu and v values, based on the entire group of n items, 
will make (18) a maximum. First let us refer to figure (1). Evi- 
dently there is some curve w(v) dividing the (u,v) points corres- 
ponding to the items into two groups such that: 


1. m points will be above this curve. 
2. The uw and v values of these points will be the m pairs of 
values which will make (18) a maximum. 


We therefore consider a cylinder meeting the (u,v) plane at the 
curve u = @(v) such that when q(v) is substituted in the lower in- 
side integrals of (14) and (15) the following conditions will be sat- 
isfied: 


m= | ” i F(u, vydudv (19) 
vi J P(r) 
and 
So fn uf(urvydudv 
| Apes: 4. — max. (20) 








VS fv i (u,v) dudv 


gi) 


Integrating with respect to win (19) and (20) gives: 
m= fi lp(v),0] de (21) 


Se" holy (ve), vj] dv 

Va hs [g (v), v) d VU 
where f;, f., and fs; are known functions and g(v) is unknown. These 
equations present the analytical formulation of the problem. I sus- 
pect, however, that the determination of the function g(v) would in- 
volve the calculus of variations, or the theory of integral and func- 
tional equations. Unfortunately my familiarity with these branches 
of mathematics is too limited to be useful in this connection. 

It is, of course, possible to express gy as a power series in v with 

unknown coefficients and then perform the integrations indicated by 
(21) and (22). If we let 


—= max. (22) 
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p(v) =a&+a,0+a@0?+--- (23) 
we could write (21) in the form 
m= F,(d, d, +++ ) (24) 
and (22) 
pa elt %:**) (25) 


F's (do, Gi +++ ) 


where all the F’s are known functions. 
Setting up the function with the Lagrangian multiplier we get 
from (24) and (25) 
Y—L—im (26) 
Differentiating (26) with respect to each of the a’s and equating to 
zero we have 
ie i 


K— == Wo(do, dr = 4) =0 
0 A (27) 
a Pid a,---4) =0 

0 a, 1(\ Mo, 1 Ay 


Equations (27) together with equation (24) enable us to solve for the 
a’s. These would, of course, in general be non-linear, and would have 
a multiplicity of solutions. The solutions would have to be found by 
laborious and complicated procedures. The problem of determining 
which of the solutions gives the largest value for (22) would also be 
troublesome. Furthermore, the determination of the frequency func- 
tion f(u,v) is by no means a simple task. 

It is hoped that the foregoing analysis may stimulate mathema- 
ticians to a practical analytical solution of the problem. In the mean- 
time we shall outline an approximate solution based on considerations 
suggested by the analytical formulation, and having the pragmatic 
justification that satisfactory results are obtained by the method. 


II. Approximate Solution 


In practical situations the total number of items, 7, is usually 
determined by the total number of items which may be administered 
to a group in one or two sittings. This number will vary according to 
the type of items used. Ordinarily the number will not be greater 
than five or six hundred. To get an adequate sampling of material 
the number should be at least four or five hundred. 

It is also desirable to set a minimum for the final number of se- 
lected items. If the number is too small, the reliability may be so 
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low that the validity of the selected items will be spuriously high. If 
the items are all of the multiple choice type with from three to five 
choices, it is reasonable to specify that the final test should have 
from 125 to 175 items to insure adequate reliability. 

First we shall select a number of items somewhat in excess of 
the number which we wish to retain in the final test. The reason for 
this becomes clearer after the method of selection is described. 

Suppose that in our first selection process we select from twenty 
per cent to thirty per cent more items than we wish to retain in the 
final test. Let us assume that we have a total of 500 items, and that 
the final test will have 140 items. In our first selection we select 180 
items, let us say. 

In order to determine how these 180 items shall be selected let 
us refer to figure (2). The bounded region has the same meaning as 
in figure (1). All items below the v-axis correlate negatively with the 
criterion Y, and all above correlate positively. All items to the left 
of the u-axis correlate negatively with the total test and those to the 
right correlate positively with the total test. 

First we reject all items below the v-axis, that is, all which cor- 
relate negatively with the criterion. We know that in doing this we 
shall not eliminate items which would not also be eliminated by the 
curve w(v) in equations (21) and (22), provided m is sufficiently 
small. 

Now we know in general that the slope of w(v) shall not be 
negative at any point between v; and v,. This is merely saying that, 
of two items having the same u value, the one with the lower v value 
would be preferred. That is, if two items show the same degree of 
relationship with the criterion, the one with the lower degree of re- 
lationship with the total test is the more desirable. 

We take then as an approximation to m(v) a curve composed of 
two straight lines meeting at a point on the v-axis. One line shall 
be the portion of the v-axis to the left of this point. The other shall 
be a line of positive slope above the v-axis. The position of the point 
on the v-axis is determined by the following considerations. 

Suppose the point were taken at C, to the left of the u-axis, and 
the line CD were drawn, thus giving us the curve HCD. The items in 
the area CAE would then be eliminated, together with all the other 
items below the curve HCD. But the items in CAE are all positively 
correlated with the criterion Y and negatively with the test. This 
means that if they were included in the summations in (18) the de- 
nominator term would be decreased and the numerator term would 
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be increased. Therefore L would be increased. Consequently they 
should not be eliminated, and the line CD is incorrect. 

Suppose now the point were taken at F to the right of the u-axis, 
and the line FG were drawn, thus giving us the curve HFG. Consider 
the narrow shaded portion above the v-axis between A and F. If the 
items above HFG were selected, those in the narrow shaded portion 
would be included. But since the w values of these items is negligible, 
their contribution to the numerator term of (18) would be negligible. 
However, the contribution of their v values is not negligible as com- 
pared with their u values. Hence their inclusion would tend to re- 
duce L and therefore they should not be included. 

As a result of the preceding logic we take our point at the inter- 
section of the axes, and draw the line AB, thus giving us the curve 
HAB. The slope of the line AB is such that the required number of 
items, m, will be above the curve HAB. This slope may be determined 
as follows. Referring to figure (3) we find the number of items lying 
to the left of the u-axis and above the v-axis. These items correlate 
positively with the criterion and negatively with the total test. We 
indicate their number by c. All of these c items will be selected. If 
m is the total number of items we wish to select in the first process, 
we must select e items lying in the quadrant to the right of the w-axis 
and above the v-axis where 


e€é—=m—c 


First we calculate the ratios - for all the items in this quadrant. 


Suppose that we find the e items with the largest ratios. Of these e 
ratios let the smallest value be b. The line AB will therefore have the 


‘ : ; ; , u 
slope b, since any point lying above the line will have a . value great- 


, , u 
er than b and any point lying below the line will have a “ value less 


than b. Therefore, in determining the slope of the line we automati- 


cally select the desired e items. 
I have found that the line AB usually lies between the lines AC 


and AD in figure (3) where the slope of the former is “ and the 


h 


slope of the latter is %4 = . Since w, and v, are simply the highest 
h 


u and v values given by the data, it is easy to calculate the two ratios. 
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These ratios are useful in facilitating the selection of the desired e 

items. The procedure is as follows. We count the total number of 

items whose - ratios are higher than “, selecting all of these items. 
/h 

Let their number be g. We then make a frequency distribution of all 

ratios lying between = and 14 and select the (e—g) items with 

the highest ratios. 

In our second and final selection process the procedure is as fol- 
lows. First we calculate new scores based on only the items selected 
in the first process, and from equation (7) calculate new v values for 
these items. Since the wu values do not change, we consider the scat- 
ter diagram of the new v values plotted against the original wu values. 

In general, it will be found that the new v values tend to shift 
toward the right so that certain of the v values which were negative 
when calculated on the basis of all the items will be positive when 
calculated from only the m selected items. It is because of this phe- 
nomenon that in the first selection process twenty per cent or thirty 
per cent more items are selected than are desired in the final test. 

In the second selection process we again consider a line drawn 
from the intersection of the axes, with a slope such as to eliminate 
the twenty or thirty per cent excess selected in the first approxima- 
tion. The procedure is precisely the same as for the first selection 
process. 

It is possible to improve the validity of the final test by going 
through more than two selection processes, and eliminating a smaller 
proportion of items in each process. However, each selection cycle 
increases the number of degrees of freedom represented in the final 
selection, thereby contributing toward its instability. 

I have not determined mathematically how the degrees of free- 
dom vary with successive selection processes, nor do i know whether 
this function can be calculated mathematically. However, it is prob- 
ably safe to say that not more than one selection process should be 
carried through for every 50 cases in the total population. I have used 
two selection processes for a population of 80 cases, but I seriously 
doubt whether a population smaller than this would justify more than 
one selection process, certainly not more than two. 


IIT. Clerical Procedure 


The item selection method here described is most efficiently car- 
ried out by means of punched cards, a sorter, and a tabulator. Forty- 
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five column equipment has proved adequate for all the analyses I have 
conducted so far. Both raw criterion and raw test scores may be re- 
duced to class interval values of 0 to 9 inclusive. Thus a single column 
will suffice for each of the two measures. If two successive selection 
processes are carried out there will be two sets of test scores, one 
based on all the items, the other on only those selected in the first 
process. This makes a total of three columns for criterion and test 
scores. If code numbers are to be punched, additional columns will 
be required, usually not more than three. This leaves a total of 39 
columns for punching the item responses. Each of the 12 positions in 
each of these 39 columns corresponds to a given item. If the individual 
has answered an item correctly the corresponding position in the card 
is punched. Otherwise it is left blank. Including the two positions at 
the top, the card would accommodate 468 items. It is possible, how- 
ever, to write the code numbers on the card instead of punching them, 
thus giving space for a total of 504 items. 


First Selection Process 

After the items have been coded, the first step is to punch the 
corresponding position in the card for each item correctly answered 
by the individual. Then the class interval values of the criterion and 
total test scores are punched in the columns assigned to them. The 
column for the scores based on the items selected by the first process 
is left blank for the time being. 

After the cards have been punched they are sorted on the first 
item, in order to segregate the cards of all persons who answered the 
first item correctly. These cards are then tabulated on the criterion 
and test score columns by means of a printing tabulator. Thus the 
tabulator prints the sum of the criterion measures and the sum of the 
test measures for all who answered the first item correctly. These 
operations are repeated for each of the items, giving a printed pair 
of sums for each item. 

The table is a section of a typical worksheet for an item selection 
project in which two selection processes are carried through. The 
procedure is as follows. 

In column 0 write the code numbers of the items. 

In column 1 enter the frequency of correct responses for each 
item, that is, the number of persons answering each item correctly. 

Next calculate the mean of the criterion measures M, and the 
mean of the total test score measures M,. These means are obtained 
by running all the cards through the tabulator, tabulating on the 








240 PSYCHOMETRIKA 


criterion and score columns and dividing the two printed totals by N, 
the number of persons. Carry the division to two decimal places. 

Enter M, at the top of column (3) and M, at the top of column 
(6). 

To get column (3) multiply column (1) by M,. It is unnecessary 
to carry decimal figures in this column. 

As the printed sums corresponding to a tabulation for each item 
sort are obtained, enter these in columns (2) and (5), column (2) 
being the criterion summations and column (5) the total test score 
summations. 

To get column (4) subtract column (3) from column (2). The 
values in column (4) are designated u, and each u is proportional to 
the product moment of the corresponding item with the criterion. If 
the product moment is negative for any item then it has a negative 
correlation with the criterion and should be eliminated from further 
calculations. Therefore all rows of the worksheet with negative val- 
ues in (4) are crossed out. 

For all subsequent instructions it is assumed the calculations will 
not be made for rows which have been crossed out. 

To get column (6) multiply column (1) by M,. 

To get column (7) subtract column (6) from column (5). The 
values in column (7) are designated v. 

To get column (8) divide each entry in column (4) by its corres- 
ponding entry in column (7), provided the corresponding entry in 
column (7) is positive in sign. If it is negative in sign leave the 
corresponding position in column (8) blank. 

Now suppose that you wish to select 180 items in your first se- 
lection process. Suppose also that you have a total of 500 items, and 
that 150 of these have negative entries in column (4). This means 
that you have crossed out 150 rows and that you have 350 items left. 
Suppose that 50 of these have negative entries in column (7). This 
means that these 50 items correlate positively with the criterion and 
negatively with the total test. These 50 items will definitely be se- 
lected in the first selection process. This leaves entries for 300 items 
in column (8). From these 300 items you will select 180 — 50, or 130 
items in the following manner. 

1. Find the highest entry in column (4). Suppose it is 45. 

2. Find the highest entry in colmun (7). Suppose it is 70. 

3. Divide the highest entry in column (4) by the highest entry 
in column (7). In the example given this would be 45/70 — .64. 

4. Divide this ratio by 2. In the example this would be .32. 
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5. Incolumn (8) count the number of entries which are greater 
than the ratio described in (3). In the example this would be all en- 
tries greater than .64. The items corresponding to all such entries 
will be selected. Suppose there are 80 such items. 

You have now selected 50 + 8 9,or 130 items. You wish to se- 
lect 180 — 130, or 50 more. 

First make a frequency distribution of all those entries in col- 
umn (8) which lie between the ratios described in (3) and (4) above. 
In the example this would be all entries between .32 and .64. Suppose 
there are 90 such items. From your frequency distribution count off 
the 50 highest tallies and note the value of the lowest of these 50 
highest. Suppose this is .45. 

Select all items whose entries in column (8) are not less than 
this lowest value (in the example, .45). 

You have now selected the 180 items which you were to select 
through the first process. 

The Second Selection Process. 

Suppose you wish to select 140 items for the final test. This 
means that you must eliminate 40 of the 180 items which you selected 
by the first process. 

First recalculate the test scores on the basis of the selected 180 
items. Perhaps the most convenient method for doing this is as fol- 
lows: 

Make a stencil card by punching out all positions on a blank 
card which corresponds to the 180 selected items. To get a person’s 
score on the 180 items simply superimpose the stencil card on his 
card and count the number of holes common to both cards. 

Convert the new scores to class interval scores and punch these 
into the column provided on the cards. 

Now run all the cards through the tabulator, and find the total 
of all the new scores. Divide this total by N to get M‘” and enter the 
new mean at the top of the column (10). The superscript (¢), 
(t = 180), means that you are using only the 180 items. 

To get column (10) multiply column (1) by M‘. Do not, how- 
ever, make calculations for crossed out rows. 

Next sort the cards on only the 180 items. For each sort, tabu- 
late on only the new score, and enter the printed totals in column (9). 

To get column (11) subtract column (10) from column (9). The 
entries in this column are designated by v“. 

To get column (12) divide each entry in column (4) by its cor- 
responding entry in column (11), provided the corresponding posi- 
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tion in (12) is neither blank nor negative. For each blank position 
or negative entry in column (11) leave a blank position in column 
(12). 

The final 140 items are selected from column (12) in the same 
manner as were the first 180 from column 8. 


SAMPLE WORKSHEET FOR ITEM SELECTION BY MEANS OF 
A MAXIMIZING FUNCTION 








0 1 2 3 4 5 6 7 8 9 10 11 12 
Code M =4-53  (2)-(3) M =4-51 (5)-(6) (4)+(7) —— (9)-(10) (4) +(11) 
No. N =¥ MxM wo SS Mem vw w/v FS Moxa y — u/y 
000 58 286 263 23 316 262 54 43 

001 29 115 131 —16 

002 72 368 326 42 3856 825 81 1.385 374 326 48 88 
003 48 224 £217 7 228 216 12 58 231 217 14 .50 
004 438 224 «195 29 2383 194 39 .74 2380 195 35 83 
005 60 261 272 —11 

006 42 204 £190 14 196 189 7 2.00 199 190 9 1.56 
007 83 399 376 3 417 374 43 53 414 376 388 61 
008 71 33 322 9 356 320 36 25 

009 78 73 3538 20 418 352 66 30 

010 50 255 227 28 267 226 41 68 267 227 40 -70 
011 35 188 159 29 205 # £158 47 62 194 159 35 83 
012 93 450 421 29 471 419 52 56 472 421 51 57 
013 8 379 38 —6 

014 64 292 290 2 

015 61 281 276 5 

016 80 397 362 35 

017 40 188 181 7 

018 62 301 £281 20 

6 8 
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A variation of the centroid method is described and illustrated. 
By the application of new rules for reflecting signs, it may be pos- 
sible to reduce to insignificance the factor loadings of tests showing 
insignificant correlation (original or residual) with clusters of tests 
having relatively high intercorrelations. As a result, a factor com- 
mon to any one of these clusters may be revealed by the centroid 
method itself with little or no need for rotation of axes or further 
calculations. 


The centroid method described by Thurstone has attracted wide- 
spread attention and promises to be of great value. The method is a 
laborious one, however, and, more important, usually does not yield 
significant results until by further laborious and, as yet, by no means 
routine calculations, the factorial matrix first yielded by the centroid 
method is transformed into a more meaningful one. It may be of in- 
terest, therefore, to describe a simple labor-saving modification of the 
procedure outlined by Thurstone (a procedure which will hereafter be 
referred to as the orthodox method). The saving in labor is enor- 
mous, in that the transformation of the first obtained factorial mat- 
rix, by rotation of the axes, may, when only rough approximations 
are desired, be entirely eliminated. More important, however, than 
the saving in labor, is the fact that the immediate results are mean- 
ingful. 

The factor loadings obtained by the method to be described are 
very little affected by the order in which the factors are removed, and 
therefore do not show the tapering off in the variance accounted for 
which is characteristic of the orthodox centroid method. Indeed, the 
number of factors required may, as in the illustration to be given, be 
determined before any one of the factor loadings have been calcu- 
lated. When this is the case, the modification here described may 
even eliminate the need of calculating the residuals — a lengthy oper- 
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ation which is required by the orthodox centroid method before tak- 
ing out each new factor after the first, and which constitutes the 
greater part of the labor involved in that method. 

The modification to be described is, at present, far from a rigid 
method, which can be proved to give good results in all cases. It is 
regarded simply as a useful approximation procedure. It may be ob- 
served, however, that whatever its shortcomings, it yields a factori- 
zation which accounts for the correlations almost as well as the or- 
thodox method. It has the same theoretical justification as the cen- 
troid method in other forms, since the modification consists only in 
different sign-changing rules. It follows that even if one proceeds to 
rotate the axes, he has not lost time in using the present method. On 
the contrary, he should save time, since the modified method affords 
an excellent basis for any further hypotheses which may be needed. 

Irrespective of the usefulness of the method, it is of interest in 
that it brings out with striking clearness the fact that the grounds 
for postulating a factor are merely that the tests show a certain 
clustering as regards their intercorrelations, that is, that tests fall 
into groups so that the intra-group correlations are relatively high 
as compared with the inter-group correlations. 

In Thurstone’s first description of the centroid method, he rec- 
ommended changing to positive, by the proper reflection of the test- 
variable, the signs of all correlations in the column having the larg- 
est sum. This procedure has certain defects, among them the fact 
that it sometimes leads to factor-loadings greater than 1.0. The prin- 
ciple later adopted was that “every trait the majority of whose cor- 
relations are negative is to be reversed in sign”. The sign-reversing 
process is continued, before the extraction of each factor, until at 
least one-half of the correlations, shown by each variable, are posi- 
tive or zero.* The guiding consideration behind this procedure ap- 
pears to be the assumed desirability of accounting for as much as 
possible of the residual variance by each successive factor, an assump- 
tion which has to be reconciled, by later calculations, with the fact 
that the factor-loadings should not be dependent upon the order in 
which the factors are removed. The sign changing method here pro- 
posed, which for convenience will be hereafter termed the “suppres- 
sion” method, aims to free the calculated loadings with any one fac- 
tor from the effect of the correlations due to other factors, that is, to 
neutralize or suppress the effect of all factors other than the one be- 


*L. L. Thurstone, The Vectors of Mind, 1935, p. 96. 
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ing removed. This result is accomplished, approximately, when about 
one-half of the correlations due to the other factors are made positive 
in sign and one half of them made negative. In terms of the centroid 
method the sums of columns (of the correlational matrix) for tests 
not containing the factor being removed should equal zero. While it 
may be impossible to bring about exactly this result by any sign- 
changing rules, it is believed that the procedure illustrated below will 
ordinarily tend rather definitely to do so. The procedure may best be 
described in connection with an illustration. 

Our illustrative problem begins with an assumed set of factor 
loadings on the part of 20 tests as indicated by Table 1. 


TABLE I 
ASSUMED FACTOR LOADINGS 








Test I II III IV h2 
1 0 0 0 5 25 
2 0 7 6 0 85 
3 8 2 0 0 68 
4 0 9 0 0 81 
5 2 0 7 0 53 
6 1 0 0 6 37 
7 0 0 A 5 Al 
8 0 0 1 6 85 
9 A 8 0 0 80 

10 A 0 0 1 65 

11 0 3 8 0 73 

12 5 0 1 0 74 

os . 0 5 0 6 61 

14 7 0 0 0 49 

15 1 0 0 8 65 

16 0 0 6 0 36 

17 5 4 0 0 Al 

18 3 0 8 0 73 

19 9 0 0 0 81 

20 0 1 0 6 85 





A correlational matrix was made from this table, on the usual 
assumption that the correlation between any two tests equals the sum 
of the products of their loadings with each factor (e.g., 1.5 = 
(.8 x .4) + (.0 X .0) + (.0 X .7) + (.2 X .8) or + .48). This 
correlation matrix was then subjected to analysis by the orthodox 
eentroid method. The results of this analysis are shown in Table II. 
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TABLE II 


ANALYSIS YIELDED BY THE ORTHODOX CENTROID METHOD 











Test I II Ill IV Residuals 

1 +.265 +.233 +.057 +.376 Magnitude Frequency 
re —.5! —.412 ; 
; aes ke reo “a Co oe : 
ee 5 +.010 to +.029 21 

4 +.403 —.287 —.708 —.007 010 to +.009 128 
5 +.487 +.2138 +.163 —.480 i te ea 35 
6 +.362 +.212 +119 +.444 —.050 to —.031 5 
7 +.483 +.425 +.081 +.109 beiabe 
8 +.650 +.624 +.120 —.066 Total— 190 
9 +.558 —.541 —.453 —.016 ie ae 
10 +.544 +.059 +.289 +.495 o— .013 
EL +.565 +.275 —.185 —.535 

12 +.624 +.023 +.3829 —.491 
3 +.539 +.097 —.368 +.451 

14 +.364 —.487 +.424 —.006 

15 +.444 +.284 +.1382 +.565 

16 +.336 +.291 +.048 —.410 

17 +.444 —.490 —.060 —.001 

18 +.580 +.191 +.224 —.542 

19 +.440 —.563 +.504 —.030 

20 +.606 +.017 —.489 +.436 











It is clear from Table II that the loadings obtained by the ortho- 
dox method, as was to be expected, bear very little resemblance to the 
original loadings. Much further work would be necessary before the 
analysis could be regarded as at all satisfactory. 

We may now describe a method which gives a close approxima- 
tion to the true loadings without rotation of axes. The first step is to 
pick out four or five tests which are highly correlated with each other, 
which will be called the reference tests for factor I, and also the tests, 
here termed non-factor tests, which show insignificant correlation 
with the factor reference tests. This may be accomplished in any 
one of a variety of ways. One of the simplest will here be described. 
All the correlation coefficients are first arranged in order of magni- 
tude from largest to smallest. Such a list, printed with identifying 
test-numbers in a parallel column, is readily obtained by Hollerith 
machines. The twelve highest correlations in this list, arranged in or- 
der, are given in Table III since they will be needed for future ref- 
erence. 

We next mark off the highest third and the lowest third of the 
entire list of 190 correlations. Whether one uses a third or some 
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TABLE III 


THE TWELVE HIGHEST CORRELATIONS YIELDED 


BY THE ASSUMED FACTOR LOADINGS 











Tests r 
8 and 19 +.72 
4 and 9 +.72 
18 and 20 +.71 
12 and 18 +.71 
2 and 11 +.69 
11 and 18 +.64 
4 and 20 +.63 
2and 4 +.63 
14 and 19 +.63 
5 and 18 +.62 
10 and 15 +.60 
5 and 12 +.59 





other fraction does not appear to be very important. The formula- 
tion of the best rule in this connection will require further investi- 
gation. For each of the tests, in the present illustration 20 in num- 
ber, we then list the number of the covariable with which it shows a 
high correlation (among the highest third) and of every test with 
which it shows a low correlation (among the lowest third). This 
procedure results in two simple tables, Tables IV and V. 


TABLE IV 


TESTS WITH WHICH EACH TEST SHOWS A HIGH CORRELATION 














12 8 4 5 6 7 8 9 10 11 12 18 14 15 16 17 18 19 20 
jo 12 19 9 18 15 8 7 4 15 2 18 20 19 10 11 9 12> 8 Is 
10 4 1420 12 10151120 618 5 15 38 6 18 38 1114 4 
9 1% 2 11. 201018 2 20 8 11 412 #8 519 512 9 
20 9138 8 18 Zit - 8.6 8 0 17 20 $ 4&2 S47 2 
18 1217 2 & 5 8 1812 19 9 18 Irs. 29 5 
12 16 1518 1916 2 8 i 2 16 10 10 
8 219 7 16 6 1 8 
5 10 1 38 2 6 
16 16 14 
13 20 
13 


6 
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TABLE V 


TESTS WITH WHICH EACH TEST SHOWS A LOW CORRELATION 








228 4 6 6 9 8 8 40 41°22 28 14245 06 17 18 «19 «20 
9 10 8 18 18 2 9 9 8 11 10 18 18 11 2 20 7 18 13 18 
9°19 8 201119 3 3 2 19 20 22 246 1956 20 lt 
BeBe w4i4AAanwie4 kh 4B 64H 8 4 tk CB 
m7 45 4°40 146 3° 446 3264 4462001 9891 1 8 & 
5 14 ‘{ 417 6 16 16 4 1. 36 
4 1 16 14 14 1 14 13 15 20 «#5 
5) 19 7 14 16 

12 6 4 13 4 

2 15 1 10 1 

11 5 6 

16 1 3 

14 14 1 





The next step is to select a group of tests which are in high 
agreement as regards the tests with which they show high correla- 
tions. We note first the two tests which show the highest correlation. 
The highest correlation is tied between two pairs, tests 3 and 19 and 


tests 4 and 9. Choosing the first of these pairs, onsult Table IV 
to determine whether tests 3 and 19 show a hi, greement in the 
tests with which they have a high correlation. £ all tests in the 
column for test 3 occur in the column for test © 11 4 obvious that 


they do so. All tests appearing in both columns, us well as tests 3 
and 19, are therefore used as reference tests for factor I. The list 
thus obtained includes tests 3, 9, 12, 14, 17, and 19. The object is 
simply to obtain a small group of tests which correlate highly with 
each other. Other, and perhaps better, but hardly more simple, ways 
of doing this are easily imagined. One such method has been de- 
scribed by Holzinger.' It seems unnecessary to discuss in detail all 
these possible methods. 

The assumption now made is that the group of tests which corre- 
late highly with each other, providing they meet certain criteria to 
be mentioned later, have at least one common factor. A second 
assumption is then made, namely, that a test which shows no signif- 
icant correlation with any one, or more, of these tests has no signif- 
icant correlation with their common factor (axiomatic only on the 
hypothesis of the absence of significant negative factor-loadings). 


1Holzinger, K. J., and Swineford, F., Preliminary Report on the Spearman- 
Holzinger Unitary Trait Study, Statistical Laboratory, Dept. of Education, Uni- 
versity of Chicago, 1936, No. 7., pp. 2-5. 
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The next step, then, is to find a set of tests which show insignif- 
icant correlations with a set of highly intercorrelated tests. From 
Table V, we find that the test showing the largest number of low 
correlations with the reference tests is test 1, with 6 low correlations. 
Other tests showing at least two low correlations with the reference 
tests are numbers 7, 8, 16, 12, 4, 13, 20, 2, and 11. There is thus ob- 
tained a list of non-factor or low-correlation tests. 

Now if the factor prominent in the reference tests is to be cor- 
rectly isolated, then the tests which correlate insignificantly with the 
reference tests should show loadings of nearly zero with this factor. 
Consequently, in terms of the centroid method, the sums of the col- 
umns in the correlational matrix should be nearly zero in the case of 
these low-correlation tests. This last consideration gives us our sign- 
changing rule, namely, to change signs of the low-correlating tests 
so as to minimize their column sums (or, if preferred, the squares of 
the sums). So far as has been determined, it is a matter of indiffer- 
ence which of these non-factor variables are reflected. The only 
rule is to choose for reflection whichever tests may be needed to re- 
duce to a minimum the mean of the arithmetical deviations, neglect- 
ing sign, or the squares thereof, of the sums of the columns of the 
list of non-factor tests. Of course a thoroughly systematic and rigid 
procedure for determining which of the low-correlating variables to 
reflect would be time-consuming. The simple method here employed, 
however, appears to give reasonably satisfactory results. It consists 
of changing at once the signs of a considerable number of the lowest 
of the low-correlating tests, and then continuing the process until the 
sum of the non-factor columns begins to increase. After enough tests 
have been reversed to reach the minimum mean column sum, then the 
correlation matrix is scanned to determine whether substituting some 
other test for one of those already reversed would give better results. 
The effect of reversing one additional or one less variable upon the 
column sums may be determined by a few moments of computation 
(consisting of adding or subtracting from the column-sum of each 
variable twice the magnitude of the correlation of that variable with 
the one whose sign is changed). One or two trials will usually suffice 
to give sums so low that they will result in very small factor loadings 
for the non-factor tests. Once this result is obtained, the sign-changes 
are regarded as satisfactory. 

For example, in the present problem, with respect to factor I, 
we first change from positive to negative the signs of variables 1, 4, 
7, 8, 18, 16, and 20. The column sums are then obtained. Then the 
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variables 2 and 11 are reversed, and the effect of reversing each and 
both of them upon the mean column sums is noted. The results are 
shown in Table VI. 


TABLE VI 
DATA TO DETERMINE TEST REFLECTIONS FOR FACTOR I 











Reference tests: 3, 9, 12, 14, 17, and 19. 
Non-factor tests: 1, 7, 8,16, 4, 138, 20, 2, and 11 











Tests Mean of non-factor 
Reversed column sums 
All nine non-factor tests 1.113 
All except No. 11 .533 
All except No. 2 307 
All except Nos. 2 and 11 -724 








Using the above data, the signs of all the non-factor tests except 
number 2 are reversed, since that procedure yields the lowest mean 
column-sum for the non-factor tests. The consequent changes in the 
signs of the correlations are made, and the loadings of all tests with 
factor I then calculated in the usual manner. 

After determining the loadings with factor I, one might proceed 
to calculate the residuals and then extract the second factor by the 
application of the above described procedure to the residuals. New 
residuals might be calculated for each factor. In most problems, such 
a procedure is probably the most satisfactory. It is possible, however, 
to calculate the loadings for a number or even all of the factors from 
the original correlations. In the case of the present problem, all the 
factors were easily determined without calculating any residuals. All 
that is necessary is to discover additional sets of factor reference 
tests. 

The reference tests for the remaining factors were chosen in the 
same manner as those for factor I. Returning to the list of original 
correlations, tabulated in order from the highest to the lowest, it may 
be observed (Table III) that tests 4 and 9 show as high a correlation 
as do the tests 3 and 19 which were used in establishing the reference 
tests for factor I. We therefore next consult Table IV, which shows 
the tests between which high correlations exist, and find that all the 
tests correlating highly with test 4 also correlate highly with test 
9. As reference tests for factor II, therefore, are chosen tests 4 and 
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9 and all the tests listed under the columns for both these tests in 
Table IV, namely 2, 13, 17, and 20. A list of non-factor tests is then 
obtained by copying from Table V all the tests appearing more than 
once in the columns headed by the reference tests, that is, all the 
tests which show very low correlations (in this case, under .08) with 
the reference tests. The non-factor II tests thus obtained are tests 
1, 5, 6, 7, 8, 10, 12, 14, 15, 16, 18, and 19. To calculate factor II, we 
first make negative the signs of tests 1, 5, 7, 8, 12, 14, 16, 18, and 19 
(those most frequently showing low correlations with factor II ref- 
erence tests). By trial, it is then found that the deviation from zero 
shown by the sums of the non-factor tests is smaller if test 6 is made 
negative and tests 8 and 18 are left positive. That change is there- 
fore made and the factor loadings then calculated as usual. 

The same procedure is followed in determining the reference and 
non-factor tests for the remaining factors. The next highest corre- 
lation, +-.71, is a tie between two pair of tests. One of the tied pairs 
consists of tests 13 and 20. Since tests 13 and 20 both occur as refer- 
ence tests for factor II, this correlation is not used in seeking the ref- 
erence tests for factor III. The other member of the tie is there- 
fore examined. It consists of tests 12 and 18, only one of which has 
been used as a reference test and both of which occur among the non- 
factor tests of factor II. Obviously if the factors are to be uncorre- 
lated, it is desirable to determine the reference tests by beginning in 
each case with two tests which correlate highly with each other but 
one of which, at least, shows no significant correlation with the ref- 
erence tests of other factors. Table IV is therefore examined to de- 
termine what tests correlate highly with both tests 12 and 18. It is 
observed that these tests are number 2, 5, 8, 11, and 16. These, to- 
gether with 12 and 18, then, constitute the reference tests for factor 
III. The non-factor tests for these reference tests, obtained from 
Table V, are tests 1, 4, 6, 10, 13, 14, 15, 19, 20, each of which occurs 
at least three times as a low-correlating test, and 3, 9, and 17, each of 
which occurs twice. 

The next highest correlation between tests both of which do not 
occur among the reference tests of factors I, II, or III is the eleventh 
highest, yielded by tests 10 and 15. By entering Table IV with these 
two tests, one obtains as the reference tests for factor IV tests 1, 6, 
7, 8, 10, 13, 15, and 20. For these reference tests Table V shows the 
followine non-factor tests: 2, 4, 11, 14, 16, and 19, the most frequent- 
lv occurring, and also 3, 5, 9, 12, 17, and 18. There are thus obtained, 
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by following perfectly objective rules, the following reference and 
non-factor tests for four factors: 


Factor Reference Tests Non-Factor Tests 
I Ss, B, 42, 18. 39,39 -1;2, 4, FY, 8; 48, 88, 38; 20 
II 2, 4, 8, 18, 17,30 4,5, 6, F,. 8; 30, 22, “28, 46,16; 18,. 29 
Ill 2 6, B; 24,42, 46,46 1, 4, 8; 40) 18, 14, 25; 29, 205. 3, 9, 17 
IV 16. 9, 8, 30; 23; 35,20... 2, 4, 44, 14, 36, 19: <3, 5) 9, 22; 27,48 


The fifty highest correlations in the original correlational matrix 
consist exclusively of correlations between tests both of which occur 
among the reference tests of some one of the preceding four factors. 
This fact strongly suggests that there are only four factors. These 
four factors are therefore taken out. When each of the factors is 
taken out from the same correlational matrix, as in the present in- 
stance, there is no guarantee that the factors will be strictly orthog- 
onal. However, there is little likelihood of significant correlation be- 
tween the factors if the non-correlation tests for each factor include 
reference tests of all other factors, and vice versa, the reference tests 
of any one factor include one or more of the non-factor tests of each 
of the other factors. Such relationship very definitely characterizes 
the above listed sets of reference and non-factor tests. 

As stated above, that combination of non-factor tests should be 
changed to negative which when so changed reduces the sum of the 
squares of the column-sums of the non-factor tests to the lowest ob- 
tainable minimum. In the present analysis we have been content to 
choose the best combination observed as a result of only a few trials. 
The tests reversed in sign to obtain the four sets of factor loadings 
were as follows: 


Factor Tests Reversed 
I 1,4, 7%, 8, 14, 33,26, 20 
II 1, 5,6, %, 12, 14, 15, 16, 19 
III 1, 4, 6, 18, 14, 17, 19, 20 
IV 2, a, &, B, 21, 44,596, 2 


, , 


Each of the four sets of obtained factor loadings, given in Table 
VII, correlate extremely well with the true loadings shown in Table 
I. The coefficients of correlation between the obtained and the true, 
or assumed, factor loadings is +-.98 or +.99 in the case of each factor. 
The mean of the residuals after removing these four factors is +.005, 
and the standard deviation of their distribution .065. Smaller final 
residuals could presumably be obtained by calculating residuals be- 
fore the extraction of each new factor. In the present problem, how- 
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TABLE VII 


FACTORIAL MATRIX OBTAINED BY SUPPRESSION METHOD 











Factors Residuals 
Test I II III IV | Magnitude Frequency 
1 —101 —.021 000 +.528 | +.150 to +.169 1 
Zz —.038 +.699 +.498 +.063 +.130 to +.149 3 
3 +.785 +.188 +.068 —.045 +.110 to +.129 7 
4 —.036 +.841 —.067 +.073 +.090 to +.109 7 
5 +.258 +.051 +.728 —.045 +.070 to +.089 9 
6 +.088 —.015 +.021 +.616 +.050 to +.069 15 
7 —.081 +.017 +.455 +.525 +.030 to +.049 19 
8 +.012 +.030 +.676 +.528 +.010 to +.029 34 
9 +.3830 +.739 —.052 +.008 | —.010 to +.009 23 
10 +.286 —.004 +.067 +.662 —.030 to —.011 21 
11 +.038 +.363 +.756 +.010 —.050 to —.031 19 
12 +.529 +.070 +.734 —.051 —.070 to —.051 10 
13 —.135 +.716 —.032 +.648 —.090 to —.071 8 
14 +.718 —015 +.089 —.071 —.110 to —.091 7 
15 —.024 —149 +.045 -+.780 —.130 to —.111 1 
16 +.012 +.038 +.616 —.037 —.150 to —.131 1 
17 +.491 +4+.3895 4.011 +.016 —.170 to —.151 1 
18 +.340 +.074 +.816 —.039 —.190 to —.171 3 
19 +.869 .000 +.101 —.037 —.210 to —.191 1 
20 —.099 +.652 —.006 +.611 Mearr— +.005; o = .065. 








ever, such additional labor seems hardly worth while in view of the 
excellent approximations obtained without it. Certainly the agree- 
ment between the true loadings and those obtained by the suppression 
method, with no rotation of axes and no calculation of residuals, is for 
most purposes as high as could very well be desired. 

An objection which will be raised to the suppression method is 
that it is not axiomatic that tests showing a relatively high inter- 
correlation have at least one factor in common to a significant degree. 
Certainly reliance should not be placed on the method where this 
assumption cannot be made. There are a number of criteria by which 
one may test the assumption. Among the most useful are the fol- 
lowing: 

1. Satisfaction of the tetrad-difference criterion. Removal of 
any factor of course increases the likelihood of this criterion being 
satisfied by some four of any matrix of residual correlations. 

2. High intercorrelation of the reference tests. Since no test 
can have a loading of over .707 with each of two independent factors, 
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and rarely has one as high as .9 with any one broad factor (on ac- 
count of its specific factor) it is practically certain that four tests all 
of whose original intercorrelations are higher than +.55 or +.60 
possess to a significant degree a common factor. 

3. Knowledge of the nature of the tests. For example, if the 
reference tests selected by the methods described above should in the 
case of any one factor turn out to be all verbal tests, such as anal- 
ogies, opposites, anagrams, and artificial language, or all computation 
tests, or, in the case of motivation tests with rats, all obstruction 
tests, one may well feel justified in assuming a common factor. 

4. A high agreement between the reference tests as regards 
their correlation with other tests. 

In the case of four sets of data, ranging from 20 to 52 variables, 
it has been found possible to verify by one or the other of these cri- 
teria, usually several of them, the assumption that the tests chosen as 
reference tests have a common factor. Since it has also been found 
that the number of factors obtained is as small as with the orthodox 
method and that the residuals are of the same order of magnitude, it 
would appear that the factorization obtained would in any case be 
about as satisfactory as that immediately yielded by the orthodox 
sign-changing rules, and could be subjected to such further trans- 
formations as were deemed desirable. 

It is of interest to observe that the centroid method and the 
tetrad method both lead to substantially the same analysis in the case 
of the present problem. There need be no conflict, then, in the out- 
come of the two methods. In the present instance the tetrad method 
is by far the most economical of time and leads to a perfect solution. 
The perfection of the solution is due to the use of assumed zero load- 
ings instead of small positive or negative loadings, and the fact that 
there are only four factors. Combinations of four tests may be found 
which will exactly meet the tetrad criterion. For example, 


To,9 X 14,00 — 12,20 X V1.9 = .96 K .68 —.49 X .72—0 
It follows that the loading of any of these four tests with their 
common factor may be obtained by equations of the form 
10 wag us x Yip! ~ Y9,20 


which, in the case of test 2 gives the loading with the factor labelled 
II above. Having thus determined the loadings of four “reference 
tests” of one of the factors, the loading of any other test may be 
quickly determined. Since there are only four factors, at least one of 
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the four tests satisfying the tetrad-criterion must, in view of the fact 
that they have only one common factor, have a loading with but a 
single factor. It follows that the loading of each of the other tests 
with this factor may be determined by the exceedingly simple proce- 
dure of listing its correlations with the four reference tests, dividing 
each by the corresponding reference test’s factor-loading, and choos- 
ing the lowest quotient. The obtained loading will be exact. 

For example, the loading of test 3 with factor II is obtained as 
follows: 


Reference Tests 2 4 9 20 
Loading with Factor II +749 + :.8 4.2.7 
r, with Test 3 +.14 +.18 +.48 +.14 
Quotients + 24+ .2 + .6 + .2 


Factor loadings, factor II, test 3 = +.2 


The fact that the tetrad method leads so easily to a solution of 
the present problem does not, of course, detract from the value of the 
centroid method. The tetrad method is extremely difficult and even 
impossible of application in cases where the centroid method is quite 
satisfactory. 

In using the suppression method on actual data, it has been our 
practice to extract each factor as obtained and to calculate each suc- 
ceeding factor from the residuals, treating each new set of residuals 
by the same rules as described above (except, of course, that one re- 
moves only one factor from any one set of residuals instead of four 
factors all from the same set of correlations as has been done in the 
present instance). Only in this manner can one be sure of obtaining 
orthogonal factors. Before removing any factor, we give all variables 
their original proper signs. The effect of the various operations is 
thus much more easily noted. If the original correlations are all posi- 
tive, then at first most of the sums of the residuals are positive; when 
the analysis is complete the column sums are about evenly divided be- 
tween positive and negative and the sums are all small. It is our ex- 
perience that it pays to spend several hours, if necessary, to deter- 
mine that combination of low-correlating variables whose signs need 
to be reversed in order to reduce to a minimum the column sums of 
the non-factor tests. Particular care should be taken with the first 
factor. When the work is done carefully the suppression method it- 
self will take longer than the orthodox method; but when the analy- 
sis is completed it should be meaningful. In fact, each factor as ob- 
tained should have sense. It is not necessary to obtain all the factors 
before any of them can, as the result of elaborate computations, be 
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made meaningful. One or two rotations of axes may, however, still 
be desirable. If made, they will be made to meet a definite criterion, 
that of reducing to a minimum the loadings of a set of predetermined 
non-factor tests. 
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THE DETERMINATION OF ITEM DIFFICULTY WHEN 
CHANCE SUCCESS IS A FACTOR 


J. P. GUILFORD 
University of Nebraska 


The evaluation of the level of difficulty of a test item is or- 
dinarily derived from the proportion of a specified population pass- 
ing or failing the item. With items that have a limited number of 
alternative responses there must be a correction in this proportion 
to make allowance for chance success. A table of corrected propor- 
tions is given for different numbers of alternatives varying from 
two to eight. 


Among the procedures used in the many present-day studies of 
test items the evaluation of the level of difficulty becomes an increas- 
ingly prominent feature. Typically, such an evaluation rests upon the 
proportions of the individuals in tested populations that pass or fail 
to pass the items. The proportion of passes (or of failures) is taken 
to represent the area above or below a certain deviate of a normal 
frequency distribution. The deviate is taken as a standard measure 
of the difficulty of the item with reference to the average ability of 
the population.* Thus, an item that is passed by 20 per cent of the 
population in question has a scale value of +-0.84c. This assumes that 
not only the 50 per cent of the population below the median level of 
ability but also an additional 30 per cent of those above the median 
failed to pass it. This 30 per cent of the frequency surface lies 
between the median and plus 0.840. An item that is passed by 80 per 
cent would by similar reasoning have a scale position at —0.84c. The 
scaling of items for difficulty is thus simply rationalized and simply 
conducted. 

This line of thought, however, implies the type of test item in 
which success by pure chance is infinitely small. It cannot apply to 
the customary true-false item or to the multiple-choice item, or to any 
similar item in which a few alternative responses are offered to the 
testee. In the latter type of item a certain finite number of passes is 
to be expected even if the testees have no ability whatever to master 


*J. P. Guilford, Psychometric Methods, New York: McGraw-Hill, 1936, p. 
438. 
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the particular item. In the most extreme case, with two alternatives, 
we should expect completely ignorant or otherwise incompetent in- 
dividuals to be correct 50 per cent of the time, in the long run. This 
proportion of passes means something quite different as regards item 
difficulty than does a proportion of .50 when chance success is re- 
motely small, or even when there is one chance in five of being cor- 
rect by sheer luck. A proportion of .50 for the successes when chance 
is practically nil means that the item is of median difficulty for the 
population. A proportion of .50 for the successes when there are only 
two alternative responses to the item should mean a degree of diffi- 
culty that is extremely great for the population. Obviously, if we are 
to evaluate items of the latter type for difficulty on the same scale as 
we do those of the former type, we must introduce some radical ad- 
justment in order to make the scale values at all comparable. 

For test items with limited numbers of alternative responses 
some correction for chance success is therefore in order. Such a cor- 
rection, it is reasonable to suggest, should be in line with a similar 
correction made in an individual’s test score when it is desired to 
eliminate the effects of chance. The usual correction for chance in a 
test score is 

W 
aia PO, | , (1) 
in which S — the most probable number of items that the testee 
can actually master without the aid of chance. 


R = the number of correct responses. 
W = the number of incorrect responses. 
n = the number of alternative responses to each item.* 


As applied to the present problem, S would stand for the num- 
ber of individuals in the test population who could actually pass the 
item without the aid of chance. R is the number who actually pass, 
some by accident, and W is the number who fail. None of the W’s are 
assumed to have failed by chance. The proportion of genuine suc- 
cesses is given by the ratio S/N, where N is the number of individ- 
uals. The corrected proportion is then given by the relation 





S 
2= y= (2) 


*Op. cit., p. 445. 
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This relation can be simplified to some extent from the fact that i 
W =—N—R. Then 1 


































N—R 
ee | 
= 
_ mR—R—N+R 
a | 
_ nR—N 
=Nm—i) (3) 
Dividing through by N, we have the form 
#- 
™ ; | 
ie ms, (4) 


But the ratio R/N gives the uncorrected proportion of successes, or 
p. Substituting p in equation (4), we have 


_ np—ti 
ee 


; (5) 
an equation that would be preferred when the raw proportion of suc- 
cesses is already known. A still more useful form of the same equa- 
tion is derived from (5). 





_ +p 1 
ee Gee ven | 


face i}?— e=)- (6) 


The last equation is in linear form. For any value of 7 the parameters 








and are fixed and readily computed. Machine computa- 


1 
n—1 n—1 
tions of .» from p are reduced to a simple routine. Nomographic 
charts can be readily drawn with one straight line for each value of 
n. Still more convenient perhaps, when the uncorrected p is already 
known, is Table I, which includes corrected proportions for values of 

n up to 8 and for all values of uncorrected p to two decimal places. 
Some investigators may wish to work in terms of failures rather 
than in terms of successes; with q (q = 1 — p) rather than p. Since 
-¢ = 1 — —#p, we may modify equation (5) to read 
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—_ n(l1—q)—1 

siiaelatmainaaa | 
n—1—n-+nq+1 

_ ta 7 

ng 

- t#—1 

n 
wi (=1)¢ (7) 


Since g = W/N = (N — R)/N, in terms of the original numbers of 
successes and failures for an item, 


«= (ra)w)= (a) ° 


In some respects the use of q, or of W, is more direct than the 
use of p, or of R, since there is a direct, positive relationship between 
the number or proportion of failures and the scale value for difficulty. 
The use of p or of R necessitates a reversal of direction or of alge- 
braic sign of the scale values. Both, naturally, lead to the same devi- 
ates from the mean. 

In any event, whether an investigator uses p or q, whether he 
begins with a count of the R’s or the W’s, he will find among the for- 
mulas presented here something to fit his particular case, and he may 
feel sure that his corrected proportions will be fairly comparable re- 
gardless of the varying numbers of alternative responses to his test 
items. The correction is of course an estimate based upon the theory 
of probabilities, and like all such estimates is strictly valid for large 
numbers of cases only. For the same reason, the correction is valid 
only if we may make the customary assumptions concerning the mul- 
tiple-choice situation; that the alternative responses are independent 
of one another for the incompetent testee and equally likely to occur. 
Under the provisions just enumerated the corrections are recom- 


mended. 
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TABLE I 


A TABLE OF CORRECTED PROPORTIONS OF SUCCESSES FOR 
VARYING NUMBERS OF ALTERNATIVE RESPONSES 








Alternatives Zz 3 4 5 6 7 8 
.) se 
.99 .980 .985 .987 .9875 .988 .988 .989 
.98 .960 .970 .973 .9750 .976 O77 977 
97 .940 955 .960 -9625 .964 -965 .966 
.96 .920 .940 947 .9500 .952 .9538 .954 
95 .900 .925 .933 .9375 -940 .942 .943 
94 .880 .910 .920 .9250 928 .930 931 
.93 .860 .895 .907 .9125 .916 .918 .920 
92 .840 .880 .893 .9000 .904 .907 .909 
91 .820 .865 .880 .8875 892 895 .897 
-90 -800 .850 .867 .8750 .880 883 .886 
.89 -780 .835 .853 .8625 .868 .872 874 
.88 -760 .820 .840 .8500 .856 .860 .863 
.87 .740 .805 .827 .8375 .844 .848 851 
.86 .720 -790 813 .8250 .832 .837 .840 
.85 -700 -775 .800 .8125 .820 .825 .829 
84 .680 -760 -787 .8000 .808 .813 .817 
83 .660 .745 3 -7875 -796 .802 .806 
82 .640 -730 -760 -7750 -784 -790 -794 
81 .620 .715 -747 -7625 -772 .778 -783 
.80 .600 -700 .733 -7500 -760 -767 771 
-79 .580 .685 . .720 1375 -748 .755 -760 
-78 .560 .670 -707 .7250 -736 .748 -749 
aa .540 .655 .693 .7125 .724 82 -737 
-76 .520 .640 .680 -7000 By fi - -720 -726 
75 .500 .625 .667 .6875 -700 -708 .714 
.74 .480 .610 .653 .6750 .688 .697 -703 
73 .460 595 .640 .6625 .676 .685 691 
i2 .440 .580 .627 .6500 .664 .673 .680 
71 .420 .565 .613 .6375 .652 .662 - .669 
10 .400 .550 .600 .6250 .640 .650 .657 
.69 .380 .535 587 .6125 .628 .638 -646 
.68 .360 .520 573 .6000 .616 627 .634 
67 .340 -505 .560 .5875 .604 .615 .623 
.66 .320 .490 .547 .5750 .592 .603 611 
.65 .300 475 .533 .5625 .580 .592 -600 
.64 .280 -460 .520 .5500 .568 -580 .589 
.63 .260 445 507 .53875 .556 568 577 
.62 .240 .430 .493 .5250 544 .557 .566 
61 .220 415 .480 .5125 532 545 .554 
.60 .200 .400 .467 .5000 .520 530 543 
.59 .180 .385 .453 .4875 .508 .522 531 
58 .160 .o70 .440 .4750 .496 .510 .520 
57 .140 .355 427 .4625 484 .498 509 
.56 .120 .340 .413 .4500 472 487 497 


55 100 325 400 4375 460 475 486 
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TABLE I (Continued) 


A TABLE OF CORRECTED PROPORTIONS OF SUCCESSES FOR 
VARYING NUMBERS OF ALTERNATIVE RESPONSES 








Alternatives 2 3 4 5 6 7 8 
op 
-54 .080 087 .4250 .448 .463 474 
53 .060 .2Yd .3873 .4125 436 .452 463 
52 .040 .280 .360 .4000 .424 .440 451 
51 .020 .265 047 .0875 .412 428 -440 
-50 .000 .250 .000 .3750 .400 .417 429 
49 .000 235 .320 .0625 888 -405 A17 
48 .000 .220 007 .3500 .376 .393 .406 
AT .205 .293 .007D 364 .382 .094 
46 .190 .280 .3250 O52 .370 383 
45 175 .267 .0125 .340 .358 Tl 
.44 .160 .253 .3000 828 047 .360 
.43 .145 .240 .2875 .316 .385 .049 
42 .130 .227 .2750 .304 .823 007 
41 115 .213 .2625 .292 312 326 
-40 .100 .200 .2500 .280 .800 014 
39 .085 187 .2375 .268 .288 .303 
28 .070 Be hy fs! .2250 .256 277 .291 
ot .055 .160 2125 .244 .265 .280 
.36 .040 147 .2000 .232 .253 -269 
35 .025 133 .1875 .220 .242 .257 
04 .010 .120 .1750 .208 .230 .246 
83 .000 07 - .1625 .196 .218 .234 
32 -000 .093 .1500 .184 .207 .223 
31 .000 .080 1375 ag .195 211 
.30 .067 .1250 .160 .183 .200 
.29 .053 .1125 .148 172 189 
.28 .040 .1000 .136 .160 177 
a4 f .027 .0875 124 .148 .166 
-26 .013 .0750 .112 137 .154 
.25 .000 .0625 .100 .125 .143 
.24 : .000 .0500 .088 113 131 
23 .000 .0375 .076 .102 .120 
22 .0250 .064 .090 .109 
BI .0125 .052 .078 .097 
.20 .0000 .040 .067 .086 
19 .0000 .028 .055 .074 
18 .0000 .016 .043 .063 
AT .004 .032 .051 
16 .000 .020 .040 
15 .000 .008 .029 
14 .000 .000 .017 
13 .000 .006 
12 .000 .000 
= EI .000 


10 000 
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MATHEMATICAL BIOPHYSICS OF DELAYED REFLEXES IN 
CONNECTION WITH THE THEORY OF 
ERROR ELIMINATION 


N. RASHEVSKY 


The University of Chicago 


In continuation of a previous paper, a mechanism of delayed 
reflexes is considered more in detail. Equations governing such a 
mechanism are established and approximately solved. The formulae 
thus obtained describe the phenomenon of “concentration” of a con- 
ditioned reflex around a definite time interval after stimulation. Ap- 
plied along the lines discussed in the previous paper to some simple 
combinations of stimuli and responses the formulae lead to a descrip- 
tion of the elimination of errors by trial. They give a relation be- 
tween the number of repetitions, necessary to eliminate a wrong act, 
and other constants, describing the situation. 


In a previous paper* published in this journal, and referred here- 
after as loc. cit. we have outlined a program of applications of mathe- 
matical biophysics to psychology. In the present paper we shall elab- 
orate with some details the problem discussed in sections V and VII 
of loc. cit. 

On the basis of the mechanisms postulated in loc. cit. we have 
arrived at the conclusion that the strength of a conditioned response 
increases with the number vn of simultaneous repetitions of the un- 
conditioned and conditioned stimulus according to a curve which is 
closely related to the integral curve of a frequency distribution of 
certain physical characteristics of the neurones. Depending on the 
special assumptions which we shall make about this frequency curve, 
the “conditioning curve” will or will not have an inflection point, but 
it always will tend asymptotically to a constant value. Similar conclu- 
sions are arrived at by considering a mechanism of conditioning en- 
tirely different from the one discussed in loc. cit.+ In elaborating a 
system of mathematico-biophysical psychology, we must investigate 
systematically the consequences of all possible physical postulates. In 


*Rashevsky, N. Mathematical Biophysics and Psychology. Psychometrika, 
1936, 1, 1-26. 

+Rashevsky, N. Outline of a Physico-Mathematical Theory of the Brain. 
Jour. Gen. Psychol., 1935, 13, 82-111. 
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the present paper we shall confine ourselves to the case when the “con- 
ditioning curve” is given by a simple exponential, so that the inten- 
sity of the conditioned response is proportional to 


Mie") , (1) 


where 7 is the intensity of the central excitation at the time of con- 
ditioning, n the number of repetitions of the conditioned and uncon- 
ditioned stimuli taken simultaneously and a a constant, depending 
amongst other things on the interval between successive repetitions. 
We consider here the latters as being kept constant. Thus we have: 


R=FI(1—e*) , (2) 


F being a factor of proportionality, itself depending on various other 
constants. Let us now consider as in section V of loc. cit, that a con- 
ditioned peripheral stimulus s, elicits a central excitation process, 
which is propagated wavelike over a number of centers. Again we 
may make a number of different assumptions both as to the arrange- 
ment of the centers, over which the excitation wave travels, as well 
as about the shape of the “wave’’. And again for sake of simplicity 
and definiteness, we shall pick out two simple hypotheses. Their 
choice is arbitrary, but does not exclude any other more complex pos- 
sibilities. We shall mention those later on in this paper and develop 
them at length in future publications. 

We consider a linear arrangement of the centers, and consider 
that the “wave” has a very steep front, falling off to the rear (Fig. 
1, heavy line). With sufficient approximation, such a curve may be 


represented by 
I(x) = T,e-8* ’ (3) 


as shown by the dotted line. We also restrict ourselves in this paper 
to the case, that the “tail” of the wave falls off very steeply in other 
words that 6 > 1. This does not introduce any limitations in prin- 
ciple but simplifies considerably our formulae. 

Let the wave travel from right to left, with a constant velocity 
v. The point occupied by the “front” 7 seconds after the stimulation 
we shall choose as x = 0. Then at the moment of stimulation t — 0, 
the “front” of the wave is created and begins to travel from the point 
%) = vt. Any center, occupying the position x, is reached by the 
“front” of the wave at a time: 


t=1—<, (O<2<%) - (4) 
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Let at the moment 7, that is t seconds after the conditioned stimulus 
an unconditioned stimulus s, be produced, resulting in an uncondi- 
tioned reflex R,. Then since any center between # — 0 and # = % = 
vt is being stimulated simultaneously with s,, they will begin to be 
conditioned to the reflex R,, and if the whole procedure is repeated a 
sufficient number of times, the stimulation of any center x will result 
in a reflex R.(«) according to (2). But the quantity J in (2) is given 
by (3). Therefore the intensity of the reflex R.(~) from any center 
x will be the weaker, the larger x. If after a sufficient number of repe- 
titions s, is applied, then the excitation front reaches a center x at 
the time t given by (4). At this moment ¢ the center x will elicit a 
response R,.(2), the intensity of which will be proportional to: 





I(t) ss I(x) = J,e-Bt — J,e-Br\7-t) (5) 


In other words the conditioned response will be the stronger, the clos- 
er ¢ is tot. If during the process of conditioning, the unconditioned 
stimulus is applied always 1 seconds after the conditioned, then the 
maximum conditioned response will also be elicited through the cén- 
ter, which is reached by the wave-front exactly 7 seconds after the 
conditioned stimulation. 

At the moment ¢ the wave-front reaches the center x. None of 
the centers which lie at x’ < 2 are yet stimulated. But all centers be- 
tween x and x, are stimulated, the intensity of stimulation being giv- 
en by (5). Therefore the total response at the moment f¢, after con- 
ditioning is completed, is given by 


R.(t) =F [ T(a)de= (FI,/8) (e-8? — ¢-Bro) 
= (FI,/£) er —eber) , 


R.(t) is also strongest for t= 1. 

More complex and interesting results are obtained if we consider, 
as in loc. cit. section V, that the efferent conditioned centers send inhi- 
biting fibers to the afferent unconditioned ones. In this case, as we have 
seen, as a stronger reflex R(x) from a center near « — 0 develops, 
it will inhibit the other centers lying further from x — 0, so that 
eventually only centers in the immediate neighborhood of x = 0 will 
elicit a reflex; all others will be completely inhibited. Mathematically 
the situation is described by expressing that the intensity J in (2) is 
decreasing gradually as more neurones become conditioned in other 
centers. If a center x, elicits a response R(x,), then a center at x is 
inhibited to an extent proportional to R(x,), so that the intensity of 








268 PSYCHOMETRIKA 


excitation I(x), which originally was J,e**, becomes now [,e8* — 
bR(x,), b being a constant of proportionality. The total intensity at 
x is now given by 

I(x) =1e** —b SR(xi) , (6) 


the summation being taken over all those centers x, which are not 
completely inhibited due to the activity of other centers. When the 
number of centers is very large, we can replace the sum by an inte- 
gral and find thus for the determination of J(x) the functional equa- 


tion: 
*k 
I(a2) =e eo | R(x)dz , (7) 
0 


or introducing for R(x) its expression (2) 
*k 
I(x) = Te? — OF | I(x) (1—e-™) dx . (8) 


The upper limit k of integration is itself a function of n. For as a 
conditioning of a center proceeds, R(x) increases, the inhibition of 
the other centers increases also, and some of them, namely those that 
had a small initial J(#) — I,e**, will become completely inhibited. 
Therefore they will not contribute anything to the inhibition of other 
centers. We can determine k in the following way: We first solve (8) 
for any k, and thus obtain J as a function of the variable x and the 
parameters k and. 


I =I (2, k, 2) ° (9) 
Then, since the integration in (8) is to be carried out only up to such 


a value of x for which the centers are not completely inhibited, in 
other words for which J > 0, we find k as the root of the equation: 


I(k, k,n) =0, (10) 


obtained from (9) by substituting k for « Equation (10) gives us 
then k& as a function of n. 

Equ. (8) can be treated similarly to integral equations. Let us 
develop J(a) into a power series of the parameter b. 


I(x) =1°(x) + BI’ (x) + DF” (x) +... (11) 


Introducing this into (8) and comparing the coefficients of b, b?, etc. 
we find: 
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P(x) =he > P(e) ——FU—e*) [Pw@az, (12) 
0 


k 
I” (x) = —F (1 — e“") | I’(x)dzx_ ete., 


so that I[°(x), I’(x), I’’(x) and so on can be determined successively. 
The method holds however only when (11) converges, which will hap- 
pen for sufficiently small b. We therefore explicitly restrict the pres- 
ent treatment to such small values of b, in other words to cases where 
the mutual inhibition is not too strong. In that case for a first ap- 
proximation we may neglect squares and higher powers of b. We need 
therefore calculate only J’(x), and find thus after a simple calcula- 
tion: 


I’ (x) = —(FI,/B) (1 — e*) (1—e) , 
or 


I(x,k,n) = I,e-6* — (bFI,/f) (1 — e-**) (1—e") . (13) 
k is determined by introducing (13) into (10), which gives: 
T,e** — (bFI,/f) (1 — e-%*) (1—e-™) =0 , (14) 
k= (1/8) log [1+ £/bF(1—e™)] . (15) 
As nv increases, k decreases tending aspmptotically to 
o= (1/8) log (1+ £/bF) , (16) 


which shows that the conditioned response, which initially is spread 
over all the centers between 0 and x, — v7 gradually “concentrates” 
only around « — 0. In other words, while in the early stages of con- 
ditioning a conditioned response will be elicited at any time t(0 < 
t < 1), after a sufficient number of repetitions a response will be pro- 
duced only between ¢, and 1, where 

t, = 1— ho , (17) 

Vv 

k, being given by (16). This corresponds to the facts discussed by 
Pavlov.* 

The conditioned response R.(x) from a center x is obtained by 
substituting (15) into (13) and the latter into (8). We thus find after 
simple calculations: 

R.(a,n) = FI,{e-* — bF (1 — e-™) /[8 + UF (1 —e™) J} (1 —e™) . 
(18) 


*Pavlov, I. Conditioned Reflexes. Oxford Univ. Press. 1927. 
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If, as we have assumed, f is very large and b very small, then the 
term bF' (1 — e-*") in the denominator may be neglected as compared 
with 8. This gives with a sufficient approximation: 


R. (xn) = Fl{e-# — bF (1 —e-™) /p}(1—e™) . (19) 


The subsequent results are not essentially affected by this omis- 
sion. Using (18) instead of (19) would only give much more cum- 
bersome expressions. Introducing instead of « its expression from 
(4) we find: 


R.(t,n) = FI,{e-6"\7-») — bF (1 — e-") /p} (1— ee") , (20) 


which gives the strength of the conditioned reflex after 1 repetitions, 
from the center which is reached by the wave-front t seconds after the 
application of the conditioned stimulus. Again as before in order to 
obtain the total response we must integrate the expression (20) be- 
tween the limits 0 and t, which is easily done but results in a some- 
what cumbersome expression. For our purposes however we shall be 
interested in an actual expression for the total response for t — 0, 
or & = 2% = v1, that is immediately after the conditioned stimulation. 
Since at that moment only the first center, that one at x, is stimu- 
lated, we obtain the desired total response by simply making t = 0 
in (20). 

Differentiating (20) with respect to n and making the derivative 
equal to zero, we find that R.(t,n) reaches a maximum value after n* 
repetitions with 


n* = — (1/a) log [1 — fe""-) /2bF] . (21) 


After that R.(t,n) begins to decrease and for sufficiently large 
1 — t even becomes negative, as can be seen from (19). The response 
becomes completely inhibited. 

The maximum value of R,(t,n) is obtained by introducing (20) 
into (19). This gives 


Rinaz(t) = (Blo/4b) e-200'7-©) , (22) 


For t= 0 this gives the maximum total response. 

Let us now discuss the case considered in section VII of loc. cit. 
Let a stimulus S, produce an unconditioned response F,, which in 
turn results in the production after a time z of a stimulus S,, which 
produces an unconditioned response R,, which is the opposite of R,. 
For instance the sight of an alley in a maze may be considered as Sy), 
R, is the response to that, consisting in running into that alley. If the 
latter ends in a cul-de-sac, then this produces the reaction R,, the re- 
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tracing of ones step back. 7 is proportional to the length / of the 
alley, and the speed q of the animal: 


3 (mgt. (23) 


If the procedure is repeated a sufficient number of times, then 
S, becomes conditioned to R;. R, becomes a delayed conditioned re- 
action to S,, with its maximum strength 7 seconds after the presen- 
tation of S,. But a certain amount of the response R, will be elicited 
also at the time t = 0, at the beginning of the conditioning process, 
when the inhibition is not yet complete. The maximum intensity of 
response immediately after S, is given by making in (20) {= 0. If 
this intensity R,(0,n) is greater than R,, in other words if: 


FI,{e-6°* — bF (1 — e-*) /B} (1 —e-*) > R, (24) 


then at t — 0, that is immediately upon presentation of S., R, > R, 
will be produced, and FR, will not be elicited at all. R,, the “wrong” 
response, thus becomes eliminated. 

In order that this should be possible at all, the maximum value 


of R,(0), which is obtained by putting t — 0 in (22) must be larger 
than R,. 
Hence 


(I,/4b) e*6"" > R, . (25) 


Since, according to (20), for any t, R(t,n) first increases from 
zero with increasing n, and then decreases, condition (24) can be sat- 
isfied only for a sufficiently large n. The number v of repetitions nec- 
essary for the elimination of R, is given by the smaller of the roots of 
the equation: 


R,(0,n) =R, , (26) 
or 
FI,{e-6°* — bF (1 — e-") /B} (1—e™) = R, . (27) 


This transcendental equation may be approximately solved for suf- 
ficiently small values of a by expanding e-*” into a series 


e™ —1—an-+ a’n?/2—.-:- , (28) 


and omitting powers higher than the second. In this way the follow- 
ing quadratic equation is obtained, after simple arrangements: 


— FI,a? (1/2 e-6" + bF/B)n? + FI,ae’7*n—R,=0. (29) 
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In order that this equation should have any real roots at all, we must 
have: 


F?],°a7e-*8* — 4F],R,a? (1/2 e% + bF/p) >0 , (30) 
which gives 
x FI,e-26"7 
+ ~ QeBr 4 4bF/B ’ 
which with the approximation here used is equivalent to (25). When 
(31) is satisfied, then the smaller root of (29), which gives the num- 


ber of repetitions necessary to satisfy (24), in other words necessary 
to eliminate R,, is equal to: 


m= {e-ber — [e287 _— (RB, /Iy) (2€-8°" + 2bF'/B) ]¥?}/a(e*" + 2bF/f). 
(32) 


R (31) 





As R, increases, n increases also. In other words it takes longer train- 
ing to eliminate a wrong action, if the natural impulse for that action 
is strong. Introducing (23), (32) becomes: 


n= {e-6ral _ [e-260!__ (R,/I,) (2-6! + 2bF/B) ]*¥}a(e-! 4 2bF'/B). 
(33) 


As l increases, the expression in brackets decreases and finally be- 
comes negative when 


e-Bra! = (Ry /Iy) (2e-8' + 2bF'/B) , (34) 
or, approximately for very large 1: 
L> (1/2 Bvq)log(1.6/2R,bF) , (35) 
or also when: 
t> (1/2 £)log(1,6/2R,bF) . (36) 


Using/an antropopsychic terminology, we may say, that when the time 
between the initiation of a “wrong” act and the realization that the 
act is “wrong” exceeds a critical value, given by (36), the wrong act 
cannot be eliminated at all. In terms of the maze picture, when the 
wrong alley is too long, it will never be eliminated. Below the critical 
value the number of repetitions necessary for the elimination in- 
creases with the length / of the alley. An exact treatment of Equ. 27, 
for any value of a, does not change those conclusions qualitatively. 
The lack of knowledge at present of the numerical values of the 
various parameters involved, as well as lack of quantitative experi- 
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mental data make a closer comparison of our formulae with experi- 
ments hardly possible. Furthermore those formulae are derived un- 
der rather special assumptions. If instead of (1), we consider a con- 
ditioning curve with an inflection point, the formulae become more 
complicated, but the general conclusions remain essentially un- 
changed. The same may be said about a different choice of the shape 
of the excitation wave, making it for instance more symmetrical. We 
may take an expression, like: 


xe-b 


or any other similar ones. More radical changes may be obtained by 
considering that the number of branches of inhibitory fibers decrease 
gradually as we move from the center, where they originate. In this 
case the mutual inhibition of two centers will be the smaller, the fur- 
ther those two centers are apart. All those and other cases will be 
studied in future publications. 
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The final aim of such a type of investigation is however this. 
Such relations like (30), (32), (85), are in principle verifiable ex- 
perimentally. They give us equations between various unknown para- 
meters I, F, a, 8, b, v which characterize physically the central ner- 
vous system, and such unknown quantities as n or I. In future we shall 
derive other relations, describing different psycho-physiological phe- 
nomena, involving the same constants. Thus it is hoped to establish 
a number of equations, equal to the number of unknown constants, 
and then express the latters in terms of measurable known quantities. 
Before this can be done however a further elaboration of the theoreti- 
cal study under more general assumptions must be made. 
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A NOTE ON ITEM ANALYSIS 
AND THE CRITERION OF INTERNAL CONSISTENCY 


CHARLES I. MOSIER 


University of Florida, Gainesville, Florida 


The implications contained in Richardson’s article on item anal- 
ysis in March 1936 issue of Psychometrika are examined in the light 
of multiple factor theory. It is shown that item analysis is a neces- 
sary, but not a sufficient condition for the construction of a test 
which shall measure a single trait. The intercorrelations of certain 
items selected by a method of item analysis are examined, found to 
contain many zero and some negative correlations. Multiple factor 
analysis showed that eight traits were measured by the items which 
had been asserted to measure only one. 


The purpose of this note is to make explicit certain implications 
contained in Richardson’s article on the rationale of item analysis 
(8), and to point out certain limitations of item analysis methods 
when used, as they often are, to prepare a measure of a ‘trait’ to 
which is then ascribed psychological significance. 

As Richardson points out, the selection of items whose sum will 
give the best prediction of an external criterion (such as success at 
selling life insurance, measured by amount sold), is best accomplished 
by other methods, and such a case will not be considered here. When, 
however, such an external criterion is not available, (as it often is 
not in the measurement of personality ‘traits’) it is frequently the 
practice to select a large number of items which are thought from 
a priori analysis, clinical descriptions, or other considerations, to 
measure a certain trait, and then to apply one of the various methods 
of item analysis (6) to determine which smaller set of items fur- 
nishes the best measure. Most of these methods employ some func- 
tion of the difference between two groups selected on the basis of 
their total score on the entire list, and as Richardson points out, are 
the equivalent of, or approximations to, the correlation coefficient be- 
tween the item and total score, or of the first centroid factor loadings 
of Thurstone (10). 

While this method is necessary, it is 1.0t sufficient tor the con- 
struction of a valid test unless the original list of items itself meas- 
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ures a single trait. If, however, it measures a composite of two or 
more traits , then the items selected will also measure a composite of 
two or more traits. Quoting Richardson: ‘The item-test coefficient 
gives an indication of the extent to which the item measures what the 
test as a whole measures.” (8) (Italics mine). A consideration of a 
simple hypothetical case will make clearer what is meant. 

Let us suppose two traits (without worrying, for the time being, 
about what a ‘trait’ is), such that they are uncorrelated in the total 
population, and let us, for convenience, designate them Trait A and 
Trait B. Let us further assemble a group of a hundred items such 
that forty-five of the items require trait A for success, forty-five re- 
quire trait B for success, and on ten of the items an individual may 
succeed if he has either trait A or trait B. If now we apply this test 
of one hundred items to a standardization group and then select the 
items which have the highest item-test coefficients, we shall select, 
first of all, the ten items dependent on either trait, and then the items 
from the remainder which have the highest average intercorrelation, 
which does not preclude their having as many as half of their inter- 
correlations equal to zero. 


B 
t T 





-A 





FIGURE 1 


Two Independent Traits, A, and B, 
And Their Composite Trait, T 


A reference to the figure may clarify the argument. Traits A 
and B are represented as two orthogonal vectors, indicating their sta- 
tistical independence. The vector T is then a composite vector which 
represents the total test score, made up, as it is, of score in A (fifty 
items) and score in B (fifty items). The ten items which depend on 
either A or B will lie along, or close to the composite vector T, and 
are the ones which will be first selected as the best measures of the 
trait (and so they will be, if we are satisfied to define as our ‘trait’ the 
composite of A and B). Indeed, if all the variance of each item is in 
the A-B space, it will be possible to obtain two items, each having 
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an item-test coefficient of .70, yet one of which measures A alone, and 
the other B alone, and whose correlation with each other is zero. 

Furthermore, if we select the items in this way, either for meas- 
urement, or merely for the description of a trait, we will of necessity 
select those composite items each of which measures the composite 
trait, or those items which measure A alone or B alone and happen 
to have higher communalities (in the multiple factor sense), than the 
other items. Now the definition of a trait as the composite of two 
other traits which can be differentiated and shown to be uncorrelated 
(or to have true correlations different from unity), has certan dis- 
advantages which become obvious if we push the example to an ex- 
treme and define a trait of height-intelligence. A tall moron and a 
short genius might well make the same score. 

These disadvantages are present in the theoretical situation when 
we have isolated a trait and desire to discover its nature by an exam- 
ination of the types of behavior in which it manifests itself. The dif- 
ficulty, even the futility, of attaching psychological meaning to a first 
centroid factor loading (8) is well known. The same difficulty mani- 
fests itself in the situation in which it is desired to secure a measure 
of the trait for some clinical purpose. If we return to our hypotheti- 
cal example, having selected the best items for a scale of measure- 
ment, we may apply it to a new group of subjects. How shall we in- 
terpret scores on such a test? It is quite true that an individual who 
scores extremely high on this test of trait A-B is probably quite high 
in both A and B. Likewise, an individual who scores extremely low 
on the test is very likely low in both A and B. With the individuals 
who make scores in the middle ranges, however, the interpretation is 
not so clean-cut. Three possibilities arise: 1) the individual may be 
average in both A and B; 2) he may be very high, even the highest 
individual in the group, as regards trait A, provided only that he be 
correspondingly low in trait B (and since A and B are assumed un- 
correlated, there is no reason why this will not occur); 3) just the 
opposite of (2) may obtain, i.e., high in B and low in A. Thus, even 
in the practical situation of the clinic, the methods of internal consis- 
tency for the validation of tests leads into difficulties. Cases where 
an individual who made an average score on a test of emotional] in- 
stability and within a short time ‘cracked up’ are not unknown, and 
can possibly be accounted for in just this way. 

Nor will the application of such refinements in our scaling tech- 
nique as that proposed by Zubin (12) resolve the difficulty. In the 
hypothetical example, the correlation of a test item with a criterion 
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score of ninety-nine items will not produce results appreciably differ- 
ent from correlation with the entire hundred. The solution lies, not in 
the removal of the item being tested from the criterion, but in remov- 
ing all items which depend entirely or in greater part on any other 
trait. 

The existence of the difficulties which have been outlined above 
has not been unsuspected. Handy and Lentz (6) in their discussion 
of item value and test reliability state: ‘‘The presence of multiple fac- 
tors may or may not affect the accuracy of estimates.” We shall see 
that it may. Again, Guilford, in an early article on an introversion- 
extraversion scale (3) finds that the scale cannot be represented by 
a linear continuum, but is multi-dimensional. In a later work he finds 
that it measures no less than five uncorrelated traits. (4) 

That these difficulties are not entirely theoretical considerations, 
but actually occur in practice, may be seen from the discussion of an 
actual case. One test which has been validated by the method of in- 
ternal consistency, and has been widely used, not only as a measuring 
instrument, but as a criterion measure for the validation of other 
scales, is the Thurstone Personality Schedule (9). This scale consists 
of two hundred and twenty-three items selected from a larger group 
of over six hundred by careful editing. This scale was then given to 
a group of 740 college students and from a comparison of the high- 
scoring fifty with the low-scoring fifty students, a list of the forty- 
two most differentiating items was drawn up. Thirty-nine of these 
were selected for special study (7). That the selection of these par- 
ticular items was not fortuitous is seen from the fact that Willoughby 
(11) lists eighteen of them among his selection of the best twenty- 
five items; Harvey lists twenty-three among his list of the best forty- 
five (5). Bernreuter (1) lists thirteen items having the highest 
weights on his B1-N and B3-I scales and all of them are among the 
thirty-nine under consideration. Only eight of the thirty-nine are 
omitted from mention in one of these three lists, and four of these are 
represented by highly similar items. 

These thirty-nine items were given to a group of five hundred 
college freshmen, all men, in connection with the study cited (7). The 
intercorrelations of each item with every other item were computed 
with tne aid of Chesire, Saffir and Thurstone’s computing diagrams 
(2). Each item thus entered into thirty-eight correlation coefficients, 
and there was a total of 741 distinct coefficients for the entire table. 
A consi:Jeration of these thirty-nine items by the multiple factor anal- 
ysis methods of Thurstone (10) indicates not one, as has been in- 
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ferred from item analysis, nor two, as in our hypothetical example, 
but at least eight independent traits are measured by these thirty- 
nine items (7). 

A consideration of the intercorrelations of the items will serve 
to emphasize the point, keeping in mind that out of two hundred 
twenty-three items these are the thirty-nine having the highest item- 
test coefficients. The mean intercorrelation was found to be .23 with 
a standard error of .07. A coefficient of .20 is thus approximately 
three times its standard error, and hence is the lower limit of corre- 
lation coefficients which may be safely considered as due to influences 
other than sampling fluctuations. 

One item showed 31 out of 38 coefficients less than .20. Another 
item showed 12 negative coefficients, the largest numerically being 
more than three times its standard error. Only 15 of the items had 
more than half of their 38 possible intercorrelations more than three 
times their standard errors. Out of the entire table of 741 distinct in- 
tercorrelations, 390 were not significantly different from zero, and 77 
were negative. With so many zero and negative coefficients among 
the intercorrelations it is apparent that these items are not measur- 
ing the same trait. 

It cannot be urged that these low intercorrelations were entirely 
due to the unreliability of the items themselves, for each item had a 
number of fairly high intercorrelations. Considering the highest in- 
tercorrelations in each column of the correlation matrix, we find that 
these ranged from .86 to .29, with a median value of .55. 

It must be remembered that these items were selected as being 
the best of more than two hundred carefully edited items, all of which 
themselves satisfied the criterion of internal consistency. If we turn 
to the thirteen that Bernreuter lists as having the highest weights on 
either the B1-N or B3-I scales — two scales of which he concludes 
(1): 

“On the basis of the evaluation furnished by the P. I. 
[Personality Inventory] test, it seems probable that neurotic 
tendency and introversion (in the B3-I sense) are names 
given to a single trait, whose real nature has been obscured 
by the inadequacies of the tests by which it has been esti- 
mated.” (Italics mine) 

We find that of the 78 independently computed intercorrelations, only 
54 of them are significantly different from zero and three of them are 
negative. It must be remembered that just as the thirty-nine items 
previously considered represented the ‘best’ of the group of two hun- 
dred twenty-three, so do these thirteen represent the ‘best’ (standard- 








280 PSYCHOMETRIKA 


ized on a different group) of the thirty-nine. It would seem that the 
nature of the “single trait’”’ will continue to be obscured so long as we 
continue to rely solely upon the criterion of internal consistency in 
any of its variant forms, without investigating also the item intercor- 
relations upon which Richardson has shown it is based. 

Let us consider these results in the light of certs 'n of Richard- 
son’s theorems. The first of these state that “In a test of uniform dif- 
ficulty, the correlation of an item with the test is proportional to the 
average correlation of that item with each item of the test.” Selec- 
tion of an item with a high item-test correlation, and therefore a high 
average intercorrelation does not preclude selection of an item with 
not one, but a large proportion of zero intercorrelations, since the sit- 
uation with which we are concerned — the measurement of two or 
more independent traits in a single test — produces a very skewed 
distribution of intercorrelation coefficients. The table below repro- 
duces the distributions of the first seven of the thirty-nine items and 
three more extreme examples of what may occur. The first seven are 
- not atypical. Each column of the body of the table represents the 
frequency distribution of the correlation coefficients of the item whose 
code number appears at the top of the column with each of the other 
thirty-eight items. 


TABLE I 


FREQUENCY DISTRIBUTIONS OF CORRELATION COEFFICIENTS 
FOR CERTAIN TEST ITEMS 





Class Item Number 
Interval 1 2 3 4 5 6 of 11 19 34 
-80- .89 0 0 0 1 0 0 0 1 0 0 
-70- .79 1 0 0 0 0 0 0 0 0 1 
.60- .69 1 0 0 0 0 0 0 0 0 1 
-50- .59 1 2 0 0 0 1 2 0 1 4 
40- .49 2 4 az 2 2 1 1 4 0 4 
.00- .39 6 4 8 4 5 5 5 5 2 4 
.20- .29 6 5 12 12 9 9 7 10 4 10 
10- .19 9 3 10 7 11 13 7 3 16 6 
00- .09 10 10 6 9 9 5 10 7 13 8 
—.10-——.01 4 7 1 2 2 4 4 7 2 0 
—.20-——.11 1 3 0 0 0 0 2 1 0 0 


The skewness of these distributions is obvious, and no measures 
of central tendency, dispersion or skewness are needed to show that 
an item may have a fairly high average intercorrelation (as item-test 
coefficients go) and yet have a modal inter-item coefficient of zero, in- 
dicating that there is no single trait measured by all the items. 
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We have already considered the significance of Richardson’s 
third theorem: “The item-test coefficient gives an indication of the 
extent to which the item measures what the test as a whole meas- 
ures.” We have seen that this statement is quite correct, but that if 
“the test as a whole” is measuring the composite of two or more func- 
tions, we are led into difficulties, both theoretical and practical. 

A further statement of Richardson’s, not cast into the form of a 
theorem, but which follows as a corollary from theorem three is: 


“The procedures of item analysis .... will tend to make 
the test more pure or homogeneous, in the sense of conserv- 
ing those items which have the largest intercorrelations. This 
is the only sense in which it may be said that the conserved 
items are more ‘valid’ than the rejected items.” 
We must suppose that by “largest intercorrelations” is meant largest 


average intercorrelation, and we have just seen that an item may 
have a high average intercorrelation and yet be statistically indepen- 
dent of other traits measured by the total test, due to the extreme 
skewness of the distribution of inter-correlation coefficients. 

Careful editing of the initial items which are to comprise the 
original criterion test is, needless to say, of great value in avoiding 
the difficulties which we have been considering. However, the thirty- 
nine items which we have been studying were very carefully selected 
by capable editors. Even a restriction of the breadth of the trait to 
be measurd is not a complete guarantee that enough extraneous items 
to cause trouble will not creep into the criterion list. Willoughby 
(11) classifies the two hundred twenty-three items into a number of 
categories, of which only two are represented in the list of thirty- 
nine items. Inspection of the intercorrelations of the items of the 
thirty-nine which are in Willoughby’s “Fantasy” and “Social” cate- 
gories reveals essentially the same situation to a lesser degree that 
was found in the intercorrelations of the entire list. Furthermore, 
inspection of the factor patterns of the items in these categories re- 
veals that items in the “Fantasy” category have loadings on seven of 
the eight factors, and “Social” requires six factors to account for the 
observed intercorrelations. 

These conclusions are not, as was stated at the outset, intended 
to controvert Richardson’s conclusions. Rather, they not only depend 
on them, but are implicit in them. They have been here made explicit, 
emphasized, and illustrated merely to point out a pitfall into which 
the application of the methods of internal consistency may lead, un- 
less definite precautions are taken to guard against it. 
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We have examined, first with an hypothetical situation and later 
with actual data from items selected by the U-L method of internal 
consistency, the possibility of constructing a scale which measures a 
composite of two or more traits. We have seen the difficulty of avoid- 
ing the situation, either by repeated applications of the technique, or 
by careful editing of the items and a priori restriction of the trait; 
we have seen the difficulties which are caused by this composite trait, 
either in interpreting the nature of the trait, or in interpreting the 
test score for clinical purposes, and have seen that only by an exam- 
ination of the item intercorrelations themselves (and preferably by 
the methods of multiple factor analysis), can we be sure that we are 
not obscuring the nature of the trait “by the inadequacies of the test 
by which it has been measured.” Item analysis is a necessary, but not 
a sufficient condition for a good test. 
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A NOTE ON FACTOR ANALYSIS BY THE 
METHOD OF PRINCIPAL AXES* 


JOSEPH LEV 


The resolution of a set of variables into principal components 
may be performed upon variables having arbitrary variances, or 
upon variables obtained from these by reduction of all variances to 
unity. The two procedures do not yield the same results even if the 
principal components are reduced to unit variance in both cases, ex- 
cept when the original variances are all equal. 


The analysis of the matrix of intercorrelations among a set of 
variables into uncorrelated factors by the method of principal axes 
has been approached in two ways. In the work of Hotelling} the ro- 
tation of axes is performed after the original variables have been re- 
duced so that their variances are all unity. On the other hand Kelleyt 
performs this rotation without previous reduction to unit variance. 

In reading the above articles the reader is likely to be led to be- 
lieve that the two procedures yield the same results. In this paper 
we shall point out a sense in which these results are not the same. We 
shall show that when the new and original variables are expressed in 
unit variance the ultimate relations differ in the two procedures. 

In the following we shall consider that the number of variables is 
nm and that the range of all subscripts is 1, 2, --- , ”. 

Let x; be the original measures having variances v; and co-vari- 
ances p;;. Performing a rotation to axes &; which coincide with the 
principal axes of the ellipsoids of constant probability we obtain the 


relations 
§ = ai, M1 +--+ ain tn ’ (1) 


where the matrix of the a;; is orthogonal, and the £; are uncorrelated 
with variance V; . 

We can rewrite (1) in terms of variables whose variance is unity 
by means of the transformations 


*This article was written while the author was on a W.P.A. project at Teach- 
er’s College, Columbia University. 

+Hotelling, H. “Analysis of a complex of Statistical Variables into Princi- 
pal Components.” Jowrnal of Educational Psychology, 1933, 24, 417-441, 498-520. 

“Simplified Calculation of Principal Components.” Psychometrika, 1936, 1, 
27-35. The first reference will be referred to as loc. cit. 

tKelley. T. L. Essential Traits of Mental Life. Cambridge: Harvard Univ. 


Press, 19385. P. 145. 
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| Mace (ai: V2 + nae + Qin VUn Zn) (4) 





VVi 
We have obtained from (4) the result of a rotation of axes on 
the unreduced variables x; followed by reduction to unit variance. We 
shall now perform the rotation on the reduced variables. The result is 
Ci fi 2, -+ --» +t Bin Zn , (5) 


where the matrix of the /;; is orthogonal, and the ¢; are uncorrelated 
and have variances equal to *; . 
By means of the transformation 





Ci 
Vi = ——.—? (6) 
Vii 
we obtain a result similar to (4) 
1 ' 
ic-——_— (Bir 2: +--+ + Bin Za) ’ (7) 
VKi 


in which both the original and new variables have unit variance. 

If the two procedures are to give identical results we should ex- 
pect that the matrices of the coefficients in (4) and (7) shall be the 
same, i.e. 

ai; VV; Bij 
es (8) 
VVi Viki 
We shall show that this is not the case. 

According to the method of loc. cit. the aj; satisfy the n? equa- 

tions 





Pia Oni iain Vi Ani +---+ Din Onnt= Vi, Ghi - (9) 
Likewise the ,; satisfy the equations 
Tix Bar tee + Bri +--+ Tin Bun = Kp Bri ° (10) 


On the assumption that (8) is true we can substitute for ai; in 
(9) its value in terms of §;;, and noting that 
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Pij = VVUV7ij » 
we obtain 


Pha Bri beet Baw boo Tin Bn =" Bs : (11) 


Comparing (10) and (11) we see that the assumption in (8) 
leads to the x equations for each h, 


w= hy, a (12) 
which can be true only if 
YS: = Ve Vv ’ (13) 
and 
Viv Kp. (14) 


It is easy to see that (14) is a consequence of (13). From (13) 
and the determinants 


| Ltie-++Tin | | UUM. *** UN n 


we can express the k, as roots of the equation 
(—1)"(A" — n 279 + SA"? — Sam +.---+S8,)=—0, (15) 
and the V,, as roots of the equation 


(—1)*(4" — v nj, + v2 S.A** — v3S,A"3 + vee ot yn Sn) =0 . 
(16) 


where S, is the sum of the two rowed principal minors of a, S; is the 
sum of the three rowed principal minors, and so on. Evidently (14) 
follows from (138), (15), and (16). 


We can now show that (13) yields the result 
aij= Bi; . (17) 
For the a;; satisfy the equations 
V Tix On tere + ani tees 1 UT in Gin = Va ni =U nan, 
and the §;; satisfy the equations 


Tir Bu eee t fii a ot Pin Ban =Kn Bui 5 


which are identical. 
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Using (14) and (17) we see that (8) is a consequence of (13). 
It has been shown that (13) is a consequence of (8). The two results 
can be combined in the following theorem. 

The necessary and sufficient condition that the matrices of the 
coefficients in (4) and (7) shall be the same is 


9, = Va +oo to Uy 


We shall illustrate the result by means of a numerical example 


where v, = 9, V2 = 4, DPic—=3. Then 

ni = .8412, + .2632, , 

Ho == —.7902z, + 1.1252, , 
and 


1 
V1 —_ (2, + 22) » 
Ve 


V2 — = ene 
The results are obviously quite different. 
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THE TECHNIQUE OF PATH COEFFICIENTS 


MAX D. ENGELHART 
The Chicago City Junior Colleges 


A derivation of equations fundamental to the technique of path 
coefficients is given. Suggestions are made with respect to the cal- 
culations required in the use of the technique. The relations of the 
technique to those of partial correlation, semi-partial correlation, 
part correlation, multiple correlation, and factor analysis are dis- 
cussed. Some consideration is given to the merits and limitations of 
the technique of path coefficients. 


The technique of path coefficients is the contribution of Sewall 
Wright, who reported his derivations in 1921. (9) The following de- 
rivation represents an adaptation based upon proofs of a number of 
equalities given in an article by Dunlap and Cureton. (3) In this 
treatment a variable considered to be an effect of several others is 
labeled the “dependent” variable. The variables regarded as causes 
are labeled the “independent” variables, even though they may not be 
statistically independent, i.e., uncorrelated. 

Let us assume that all of the variance of 2, a dependent variable, 
is due to the correlated independent variables x, and 2,. Let the ele- 
ment @,.. represent that part of variable x, determined by its regres- 
sion on 2,; and let a... represent that part of 2, determined by its 
regression on «,. Then: 


t asin Qor.2 a Qoo.4 


2 x 2(a@,,,+4,,.,)? 


— 

















N N 
= x? = a? = a? 220a,,% 
0 — 01-2 02-1 01-2 02-1 
N + + ¥ (1) 


+All variables are taken as measured from their respective means as origins. 
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By definition* 














* do a 
Doyo = o(1-2) = vy 
1-2 
a , = 
ed Lo 
02-1 0(2-1) ad 
5. 
7 q2 2 02 
2a,» 2 if = x" 2 Lj ee 9 
N 0(1-2) G2 N 0(1-2) g2 ee Biss (2) 
1-2 1-2 
2a, o 2x o 
—S—S = PO —_’@=—-e¢ (3) 
N 0(2-1) “. N 0(2-1) .. 2 o2.1 
224... 02-1 wen 
ce 2 r To : Go a dy V2 
N 0(1-2) o,, O21) g N 
Since 
2% 22:% 
= 0; 02 =132 01 © 
N New ne ae 
we have 
s , 
01-2 02-1 2 7 Bo So 
N a! 0(1-2) O71 0(2-1) 02 Ti2 
“ o 
1-2 2-1 
=20 o Tr. (4) 


Qo1.2 402.1 


Substituting from (2), (3), and (4) in (1) and writing o>, the 


= x? 


variance of 2,, for 





2 


Band ae | 2 a 
oo shia Pan 2 i Maia + 20 


Lo 


*These equations are analogous to the ordinary regression equation for devia- 


g 
0 = ye % 
—x,. Here, _ ee represents a part of x,; (deter 


4 


mined by x, and independent of x,); 7, 


tion measures, 1.e., %) = 7), 


(.o) Pepresents the correlation between %, 
and that part of x, which is independent of «,; and ie represents the correspond- 
ing partial standard deviation, i.e., the standard deviation of that part of 7, 


which contributes to x, independently of x. r,| |. andr, | are coefficients of 


semi-partial correlation. 
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Dividing both sides of the equation by o° 


9 

















3 A o o 
491.2 2.1 1.2 o2.1 
ia r r +27 2 5 
o “| o za 1 oo Bo ( ) 
Equation (5) may be written in terms of beta coefficients: 
2 __ To — No2 N12 
Po1.2=  —— dl 
Po1 — To2 Vie 
"se 2) sa p2 
v1 — 
001-2) = Piss yr", (6) 
g,.,=9,V ee (7) 
Substituting (6) and (7) in (2) 
Piss eo "29% o; 
2 
Co = > 
4o1.2 a (1 wre et. 


2 Bp? 2 
= Cg 
aes 01-2 0 














2 = Pos 
% 
as. 
i 01-2 
nd 
oa? 
. J 02.4 Cc 
Similarly, ——= f?, and ™ 
o~ ve ox £ . 
0 02-1 


do 
Hence, equation (5) may be written: 
La fig + Peer + 2 Pave Pras re * (8) 


The first two terms of the right hand members of both equations 
(5) and (8) are the variance ratios measuring the “direct”? contribu- 
tions of x, and x, to the variance of x,. The product term is a meas- 
ure of the joint contribution of x, and x. over and above the direct 
contribution of each. In the above equation, all of the variance of 2 is 
accounted for. This proof may be extended to include n independent 


2 


variables. There will be m terms analogous to #?,,, and as many prod- 
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uct terms as the number of possible pairs of intercorrelated indepen- 
dent variables. The summation of all of these terms is equal to the 
square of the coefficient of multiple correlation R?.,, _,, the co- 
efficient of multiple determination. The effect of unknown or un- 
measured causes will be represented by a portion of the variance of 
x equaltol—kR* .. __ , or the difference from unity of all of the 
terms just referred to. 


Ge 61.03 +e Mn 


The symbo] —————— represents a path coefficient and has been 
Oo 


shown to be equivalent to f,,,,_. ,,- Sewall Wright suggested the por- 
trayal of the relations between a dependent varia!e and the indepen- 
dent variables by means of diagrams, and specified certain rules for 
writing simultaneous equations on the basis of the diagrams. From 
these could be calculated the path coefficients needed in the solution 
of a problem. Where certain of the independent variables were inter- 
correlated, no paths between them were indicated. Studies by Burks, 
(2) Heilman, (6) and Engelhart (4) illustrate this procedure. As 
a result of a suggestion made by the writer, Monroe and Stuit (7) 
reported that these equations simplify to the ordinary normal equa- 
tions long used in obtaining regression coefficients. 

The use of a diagram seems unnecessary to the writer. All of the 
independent variables may be assumed to be intercorrelated and the 
necessary beta coefficients obtained in the more usual ways. The sche- 
mas proposed by Griffin (5) may be recommended. 

After the beta coefficients have been obtained, the proportions of 
variance of the dependent variable to b. ascribed to the direct and 
joint influences of the independent variables may be ascertained by 


calculating 


01-234-+++Mn 02-134+-++ mM 


ete., and 
2p 


6 r B fy 
01-284--+ nf’ o9-18¢.-- n° 12? 2 01-234 .-- al 03-124... n on . 


2 a oe P 03:124---n V3 P 


etc. The results may be checked by comparing the summation of 
these measures with the square of the corresponding coefficient of 
multiple correlation. The results may be given a percent interpreta- 
tion by shifting the decimal points two places to the right. 

Heilman (6) and Engelhart (4) distributed the values obtained 
for the joint contributions of given pairs of the independent vari- 
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ables in proportion to the relative importance of these independent 
variables, as shown by the measures of their direct contributions to 
the variance of the dependent variable. The results were then com- 
bined with the values of the direct contributions to indicate the total 
contribution of each independent variable to the variance of the de- 
pendent one. In addition, Heilman distributed the variance represent- 
ing the effect of unknown or unmeasured causes, (1 — R?.,,...,), 
according to the weights of the total contributions previously obtained 
for the measured variables and which were classified as hereditary 
and environmental. Thus all, or one hundred per cent, of the vari- 
ance of the dependent variable was distributed between heredity 
and environment. The application of this procedure to measures per- 
taining to the known variables is possibly justified as an estimate. 
There is probably little justification for extending it into the un- 
known. 

Burks, (2) whose problem was the determining of the relative 
contribution of parental intelligence and environment to variance of 
child intelligence, transformed her results into coefficients of part de- 
termination. A coefficient of part correlation measures the relation 
between two variables, with the condition that estimates of only those 
parts of one or more other variables which are independent of the 
second variable are eliminated from the first variable. In this respect 
it differs from a coefficient of semi-partial correlation, in which esti- 
mates of the total influences of certain variables are eliminated from 
one of the two variables of which the net relationship is desired. It 
also differs from the ordinary partial correlation coefficient, in which 
estimates of the total influences of certain variables are removed from 
both of the two variables of which the net relationship is desired. In 
Burks’ study, the use of the coefficient of part determination seems 
appropriate, because she wished to include, as a part of the total con- 
tribution of parental intelligence, the indirect influence of parental 
intelligence on child intelligence through its effect on environment. 
Only the influence of that part of environment which is independent 
of parental intelligence should be excluded from child intelligence. 

Burks (1) has shown that the coefficient of partial correlation 
is not usually a precise measure of the relationship between two vari- 
ables. The variables eliminated should not be indirectly related, 
through correlation with other variables, to the variables of which the 
net relationship is sought. Monroe and Stuit, (7) on the basis of em- 
pirical studies in which counts of coin tosses were used, have demon- 
strated that the partial correlation technique operates with precision 
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only when the variable eliminated is a component variable, i.e., rep- 
resents a variable comparable to a factor such as one obtained in fac- 
tor analysis. The same limitation is probably true of attempts to ana- 
lyze the variance of given variables by means of the semi-partial or 
part correlation techniques. Monroe and Stuit also demonstrated em- 
pirically that a path-coefficient analysis is likely to be similarly lack- 
ing in precision. To the extent that the independent variables are 
intercorrelated, the summation of the values of the direct and joint 
contributions, and the square of the coefficient of multiple correlation 
fall short of explaining all of the variance actually due to the influ- 
ence of the independent variables. When, however, the independent 
variables are not intercorrelated, the measures of their contributions 
to the variance of the dependent variable are equivalent to the squares 
of the factor loadings of factor analysis. Their sum is the communal- 
ity of the dependent variable. 

Thus factor analysis and the path-coefficient technique give iden- 
tical results under the special condition of statistical independence of 
the independent variables; and the multiple regression equation is 
most effective as a means of predicting a criterion from given inde- 
pendent variables. The recent study of Roff (8) should be consulted 
in this connection. Where the variables studied are few in number and 
the interpretation will be more meaningful in terms of these variables 
rather than in terms of hypothetical orthogonal factors, the path- 
coefficient technique may be preferred. Where variables may be meas- 
ured with greater validity than in the fields of psychology or educa- 
tion — for example, in the field of economics — the technique of path 
coefficients may be more appropriate than the technique of factor 
analysis in studying certain problems. 


1. 
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J. P. GUILFORD. Psychometric Methods. New York: McGraw-Hill 
Book Company, 1936. Pp. ix, 559. 


The outstanding feature of this useful book is its inclusiveness. 
In addition to dealing with the more general statistical topics such as 
correlation, the normal distribution curve and curve-fitting, it con- 
tains chapters on the more important special psychological methods 
of a statistical nature, including the psychophysical methods, various 
scaling methods, and methods connected with mental tests, such as 
item validation and factor analysis. The interrelation of these meth- 
ods is emphasized, even though they are for the most part treated as 
distinct procedures with little attempt to picture them as develop- 
ments from a common set of fundamental principles. Such an at- 
tempt would undoubtedly require a reformulation of a number of the 
special methods, particularly the psychophysical methods, and entail 
a rather radical departure from precedent and tradition. It would also 
probably demand the definite formulation of the principles of a sys- 
tem of psychology in such a way as to make clear just what is to be 
understood by a psychological measurement. While not undertaking 
any such ambitious program, the author has nevertheless distinctly 
emphasized that the various methods described are all statistical in 
nature and have much in common. It may be hoped that this book 
signals the end of a period in which the psychometric methods have 
all too often been regarded as divided between separate and indepen- 
dent provinces belonging respectively to the experimental psycholo- 
gist and the student of mental tests. 

The treatment is pitched to the level of first year graduate stu- 
dents with no mathematical knowledge beyond one course in college 
algebra. The derivation or proof of formulae is almost entirely 
omitted and the emphasis is quite definitely practical, that is, upon 
the solution of psychological problems, rather than upon pure sta- 
tistics or mathematical theory. The book should be a popular one with 
the students for whom it is intended. 

As is probably wise in describing statistical methods, the author 
adheres closely to authority and offers few original formulae. He 
does, however, bring forth his own interesting substitute for Weber’s 
law, namely, the power law, A R = K R”*, a law which, to say the least, 
has a much better chance of being correct than Weber’s law, for the 
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simple reason that by the proper choice of the value of n it may be 
made identical with either Weber’s law or the Cattell-Fullerton 
square-root law (by taking 7 as either 1 or .5), as well as with many 
other laws. 

The subject of correlation is interestingly introduced in connec- 
tion with the topic of curve-fitting. As a consequence, in its first ap- 
yearance the formula for the coefficient of correlation takes the 


form, fe, in which y’ refers to the values predicted for y from 
y 
x by the equation of the best-fitting straight line. 

Among the pleasing features of the book is a three-page frontis- 
piece consisting of portraits of twelve psychologists selected as among 
“those who have led the way in the building of a quantitative psy- 
chology”. The American psychologists so honored are Cattell, Kelley, 
Terman, Thorndike, and Thurstone. The up-to-date bibliographies 
at the end of each chapter also deserve mention. 


University of Illinois HERBERT WOODROW 





ANNOUNCEMENT 


A meeting of the Psychometric Society for Midwestern mem- 
bers has been arranged with the Executive Committee of the Society. 
This meeting is scheduled for Saturday, April 3rd, at Judson Court, 
University of Chicago. Members who wish to present papers should 
get in touch with Dr. Harold Gulliksen, Chairman Program Com- 
mittee, Board of Examinations, University of Chicago, Chicago, 
Illinois. Persons expecting to attend are requested to inform the 
Committee in charge of local arrangements. 


The Committee: (University of Chicago) 


L. L. Thurstone, Chairman 
Karl J. Holzinger 
M. W. Richardson 
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